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1.  Introduction 


In  [4,  7j  specialized  array  processors  were  proposed  as  a  means  of  handling  compute-bound  problems  in  a 
cost-effective  and  efficient  manner.  These  array  processors  generally  consist  of  a  regular  array  of  simple, 
identical  processing  elements  which  operate  synchronously.  A  host  computer  drives  the  array  as  a 
peripheral.  The  array  can  be  of  many  forms,  for  instance  a  linear  array,  a  rectangular  mesh,  a  hexagonal 
mesh.  etc.  Simplicity  and  regularity  of  these  array  processors  render  them  suitable  for  VLSI 
implementation.  High  performance  is  achieved  by  extensive  use  of  pipelining  and  multiprocessing. 

A  variety  of  algorithms  have  been  designed  for  such  arrays  [l,  2,  5,  10).  An  algorithm  executing  on  such 
arrays  is  comprised  of  several  data  streams.  A  data  stream  is  unidirectional,  that  is,  it  does  not  change  its 
direction  as  it  passes  through  processors  in  the  array.  Elements  in  distinct  data  streams  move  at  different 
velocities  (processors  /  cycle)  while  all  elements  in  a  given  data  stream  move  at  the  same  velocity.  Every 
processor  in  the  array  regularly  receives  data  from  each  of  the  data  streams,  performs  some  short 
computation,  and  pumps  the  data  out.  The  array  communicates  with  the  host  through  certain 
input/output  ports  designated  as  external  input/output  ports  and  elements  in  distinct  data  streams  are 
pumped  in  through  distinct  external  input/output  ports.  We  will  henceforth  refer  to  such  algorithms  as 
•array  algorithms*. 

A  few  methodologies  have  been  proposed  for  synthesizing  array  algorithms  from  program  specifications 
[3,  6,  12).  However  in  all  these  methodologies  the  synthesis  problem  was  not  studied  in  a  formal 
framework.  In  this  paper  we  study  the  synthesis  of  array  algorithms  in  a  more  rigorous  framework  using 
a  more  intuitive  representation  of  programs,  namely,  data-flow  descriptions  of  programs.  In  particular  we 
will  be  studying  the  synthesis  of  algorithms  for  a  linear  array.  The  array  is  comprised  of  identical 
processors,  that  is,  they  all  execute  the  same  set  of  instructions  in  every  instruction  cycle,  and  they  are  all 
simple,  that  is,  they  do  not  have  any  addressable  local  memory  and  cannot  perform  branching.  The  linear 
array  is  driven  either  by  a  single-phase  or  two-phase  global  clock  [8],  In  a  two-phase  clocking  scheme  the 
two  phases  are  nonoverlapping  and  adjacent  processors  are  activated  by  the  opposite  phases  of  the  clock. 
Two  reasons  motivate  our  study  of  such  a  model.  Firstly,  this  model  has  been  used  for  most  of  the 
published  array  algorithms.  Secondly,  and  more  importantly,  linear  arrays  require  a  fixed  I/O  bandwidth. 
Hence  they  can  be  attached  as  a  peripheral  to  the  I/O  bus  of  any  existing  host  without  requiring  any 
change  to  the  host's  I/O  bandwidth. 

We  formalize  this  linear-array  model  and  then  define  the  program  graphs  that  are  appropriate  for 
execution  on  them.  A  program  graph  is  a  directed  acyclic  graph  representing  a  computation.  The  edges 
represent  values  and  the  nodes  represent  computation  of  a  function  whose  arguments  are  the  values 
represented  by  the  incoming  edges.  We  distinguish  between  correct  mapping  and  correct  execution  of 
such  program  graphs  on  the  linear  3rray  model.  The  structure  of  correctly  mappable  graphs  are  then 
examined.  We  also  briefly  mention  the  importance  of  using  some  semantic  knowledge  (that  is,  some 
property  of  the  function  represented  by  the  nodes  in  the  graph)  to  correctly  execute  the  graph. 

The  remainder  of  this  paper  is  organized  as  follows.  In  section  2  we  formalize  the  linear  array  and 
program  graph  models  appropriate  for  execution  on  the  linear  array.  We  also  provide  precise  definitions 
for  correct  mapping  and  correct  execution  of  program  graphs  on  the  linear  array.  In  section  3  we  examine 
the  structural  properties  of  correctly  mappable  program  graphs  and  support  the  formalisms  by 
synthesizing  a  few  published  and  some  novel  linear-array  algorithms. 

Since  the  proofs  of  the  theorems  are  quite  lengthy,  and  since  the  reader  need  not  understand  it  in  order 
to  proceed,  the  details  of  the  proofs  are  deferred  to  the  Appendix.  ■  - 
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2.  Computational  Models 

We  begin  with  a  formal  definition  of  the  linear  array  that  captures  the  intuitive  linear  array  model 
described  in  the  previous  section. 

2.1.  Linear  Array  Model 

A  linear  array  is  a  3-tuple  Ara“<N,LAr,l#'Ar>  where: 

1.  N  is  a  sequence  of  identical  processors  with  indices  ranging  from  1  to  |N|. 

2.  LAr={il,  12,  ..,  /It}  is  a  set  of  labels. 

3.  Every  processor  in  the  array  has  k  input  ports  and  k  output  ports,  with  each  input  port  and 
output  port  assigned  a  unique  label  ij  from  LXr.  Each  processor  in  N  is  connected  to  its 
neighbors  in  the  sequence  through  its  I/O  ports.  In  addition  the  first  and  last  processors  may 
have  input  and  output  ports  connected  to  the  host  environment. 

4.  The  array  is  driven  either  by  a  single-phase  or  a  two-phase  global  clock.  A  phase  can  be 
viewed  as  the  instruction  cycle  of  a  processor.  In  a  single-phase  clocking  scheme  all  processors 
are  activated  in  every  phase  and  every  processor  computes  a  k-ary  function  'I'Ar  In  a  two- 
phase  clocking  scheme  adjacent  processors  are  activated  during  opposite  phases  of  the  clock 
and  every  processor  computes  lPAf  in  the  phase  it  is  active. 

The  function  ^Ar  computed  by  a  processor  is  a  straight-line  program.  This  restriction  is  imposed  since 
we  have  assumed  that  a  processor  does  not  have  any  branching  ability.  We  will  henceforth  refer  to  a 
processor  in  the  array  by  its  index  in  the  sequence  N.  Let  s  be  the  index  of  a  processor.  Let 

sit—  <sij,si",..,si^>  denote  the  k-tuple  input  to  processor  s  at  time  t  where  si{  is  the  value  at  the  input 
port  labelled  Ij  of  processor  s  at  time  t.  Let  sot=<soJ,soj,..,so[‘ >  denote  the  k-tuple  output  computed  by 
processor  s  at  time  t,  that  is,  ^Ar(si.';^=sot. 

For  any  label  Ij  in  LAr,  let  p(j-  be  the  neighborhood  relation  imposed  by  label  Ij  on  processors  in  N.  Let 
<s,r>  be  any  pair  of  processors  in  N. 

Definition  2.1:  We  shall  say  th3t  processor  s  is  related  to  processor  r  by  label  Ij  denoted  as  s  p  r,  iff 

the  output  port  labelled  Ij  of  s  is  connected  to  the  input  port  labelled  Ij  of  r.  _ 

We  will  refer  to  a  path  of  uniform  labels  through  the  array  as  a  data  stream.  The  linear  array  has  the 
following  communication  features. 

1.  A  processor  in  the  linear  array  can  only  communicate  with  up  to  two  neighbors.  All  data 
streams  are  unidirectional.  Hence  for  any  label  Ij  in  LAr,  if  p is  not  an  empty  relation,  then  a 
neighborhood  constant  n(j  is  associated  with  Ij  such  that  the  output  port  labelled  Ij  of  any 
processor  s  is  connected  to  the  input  port  labelled  Ij  of  s+n(j  where  n;j  is  one  of  {1,  -1,  0}. 

2.  The  elements  in  a  data  stream  move  at  a  constant  velocity,  and  hence  a  non-zero  positive 
delav  constant  dfj  is  associated  with  every  label  Ij  in  LAr  such  that  for  any  processor  s,  if  sot  is 

the  output  computed  by  s  at  time  t  then  so|  appears  at  the  input  port  labelted  Ij  of  processor 
s+n^  at  t+dy. 

3.  External  communication  takes  place  through  certain  designated  input/output  ports  namely, 

a.  if  pt j  is  empty  then  the  input  port  and  output  port  labelled  Ij  of  every  processor 
communicate  with  the  host, 

b.  if  n(j=l  then  the  input  port  labelled  Ij  of  processor  l  and  the  output  port  labelled  ij  of 
processor  |.VJ  communicate  with  the  host. 

c.  if  n^=*-l  then  the  input  port  labelled  Ij  of  processor  |iY]  and  the  output  port  labelled  Ij  of 
processor  l  communicate  with  the  host, 

d.  if  n;j*»0  then  a  register  in  every  processor  serves  as  the  input/output  port  labelled  ij.  No 
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input/output  port  labelled  l\  communicates  with  the  host.  A  value  is  preloaded  into  this 
register  before  starting  the  computation  and  the  result  value  (the  preloaded  value  may 
be  updated  as  computation  progresses)  is  retrieved  from  this  register  after  the 
computation  terminates. 

We  will  call  the  input/output  ports  that  communicate  with  the  host  external  input/output  ports. 

The  delay  d^  can  be  implemented  as  a  queue  using  a  shift  register  of  length  d^-l  if  single-phase  clocking 
is  used  and  of  length  (d2j*l)/2  if  two-phase  clocking  is  used.  At  any  time  t,  then,  an  activated  processor  s 
in  the  array  performs  the  following  sequence  of  operations: 

1.  Compute  <f'Ar(sit)=sot  where  sit=*<sij,  si£,  ..,si^>  and  sot®<so*,  so^,  ,.,so^>. 

2.  For  every  label  fj,  dequeue  the  element  at  the  head  of  the  queue  associated  with  fj  and  place  it 
at  the  output  port  labelled  l\  of  s. 

3.  For  every  label  Ij,  place  so|  at  the  tail  of  the  queue. 

Figure  2.1  illustrates  a  linear  array  with  n(1=»l,  nj2=-l,  Oj3=0.  The  neighborhood  relation  pti  imposed 
by  label  U  is  empty. 


Fl«ur«  2. | 


Henceforth,  'linear  array  (arrays)*  used  in  the  rest  of  this  paper  will  refer  to  the  model  defined  above. 

2.2.  Homogeneous  Graphs 

The  linear  array  is  comprised  of  identical  processors  all  of  which  compute  the  same  function  (or 
execute  the  same  instruction  )  in  every  cycle.  All  the  processors  in  the  array  cooperate  in  executing  jx 
single  program.  As  all  the  processors  in  the  array  are  identical,  the  straight-line  programs  they  execute 
must  also  be  identical.  This  motivates  the  following  formalization  of  programs  appropriate  for  execution 
on  linear  arrays. 

A  homogeneous  program  graph  G=»<V,E,LC>  is  a  labelled  DAG  where: 

1.  Vs=VGuSOGUSIG,  and  VG,  SOG  and  SIG  are  three  disjoint  sets  of  vertices  with  SOG  the  set 
of  source  vertices,  SIG  the  set  of  sink  vertices  and  VG  the  set  of  remaining  vertices,  which  we 
shall  call  computation  vertices, 

2.  Lg  is  a  set  of  labels.  Let  |LG  |=k,  and 

3.  every  vertex  in  VG  has  k  incident  edges  and  k  outgoing  edges,  where  each  incident  and 
outgoing  edge  is  assigned  a  unique  label  from  LG- 

Input  edges  and  output  edges  in  G  are  those  edges  that  are  directed  out  of  and  into  source  and  sink 
vertices  respectively. 

In  any  execution  of  G  on  a  linear  array,  every  computation  vertex  in  G  is  a  single  instance  of  a  function 
evaluation  that  is  performed  in  a  cycle  by  a  processor  in  the  array.  Hence  the  function  represented  by  v 

then,  must  be  a  straight-line  program  and  we  can  view  the  k  incoming  edges  and  the  k  outgoing  edges  of  a 
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vertex  vx  as  representing  the  k-tuple  input  value  and  k-tuple  output  value  computed  by  the  processor  that 
evaluates  vx.  A  source  vertex  then,  represents  an  input  value  and  a  sink  vertex  represents  an  output 

value.  As  every  computation  vertex  represents  the  same  function,  we  refer  to  these  program  graphs  as 
Homogeneous  Graphs. 

Figure  2.2  illustrates  a  homogeneous  graph.  The  solid  and  dashed  horizontal  edges  are  labelled  11  and  12 
respectively.  The  vertical  and  oblique  edges  are  labelled  13  and  /4  respectively. 


Figure  2*2 

In  Figure  2.2  and  in  all  the  other  graphs  illustrated  in  this  paper  we  will  be  using  aoa  to  represent 
computation  vertices  and  axa  to  denote  source  and  sink  vertices. 

Although  homogeneous  graphs  are  a  more  limited  class  of  program  graphs  than,  for  instance,  general 
dataflow  graphs,  it  does  allow  the  representation  of  quite  a  number  of  interesting  programs  which  are 
potentially  suitable  for  execution  on  the  linear  array  model.  .As  we  shall  see,  not  even  all  homogeneous 
graph  programs  can  be  executed  on  the  simple  computing  engines  we  have  defined. 

Henceforth  we  will  assume  the  following: 

1.  G  is  a  homogeneous  graph. 

2.  The  label  of  a  source  (sink)  vertex  is  the  same  as  that  of  the  input  (output)  edge  directed  out 
of  the  source  (directed  into  the  sink)  vertex. 

3.  Input  (output)  value  will  always  refer  to  the  value  represented  by  a  source  (sink)  vertex. 

2.3.  MmDoinc  Homogeneous  Graphs 


We  now  give  a  precise  formulation  of  correct  mapping  and  correct  execution  of  homogeneous  graphs  on 
linear  arrays.  Intuitively,  mapping  of  G  onto  a  linear  array  Ar  assigns  each  computation  vertex  of  G  to  a 
processor  in  Ar  at  a  particular  time  step  and  also  fixes  the  delay  and  neighborhood  constant  for  every 
label  in  LG.  .Assuming  discrete  time  steps,  let  T=»{0,1,2,..}  be  the  sequence  of  natural  numbers 

representing  the  progress  of  computation  from  its  start  at  time  0. 

Definition  2.2:  A  mapping  of  G  onto  a  linear  army  Ar  is  a  4-tuple  <P.A,T.A.NA,DA>  where:  _ 

1.  PA:V'C — >N  and  T.A:VG— >T  are  many-one  functions  mapping  computation  vertices  onto 
processors  and  time  steps  respectively. 

2.  Let  I+  be  a  set  of  positive  non-zero  integers.  NA:LG — >{1,-1,0}  and  DA:LG — >I+  are  many- 
one  functions  assigning  neighborhood  constants  and  delays  to  labels  respectively. 

[Note:  N.A(/j)*n(j  and  D A(fj )=d;jj 

We  next  formalize  a  correct  mapping. 

Definition  2.3:  A  mapping  is  syntactically  correct  iff 

1.  V!j€LAf  and  for  any  pair  of  computation  vertices,  vx  and  vy,  if  there  is  an  edge  labelled  /j 
directed  from  vx  to  vy)  then  P.A(vy)*P.A(vx)+n(j  and  T.A(vy)=*TA(vx)+d:j,  and 


o 


2.  no  two  input/output  values  can  appear  simultaneously  at  the  same  input  port  of  a  processor. 

Let  i  be  the  input  value  represented  by  the  source  vertex  of  a  computation  vertex,  say,  v  Similarly,  let 
o  be  the  output  value  represented  by  the  sink  vertex  of  another  computation  vertex,  say,  v  Without  loss 
of  generality,  let  the  labels  of  the  source  and  sink  vertices  be  fj.  Now  i  is  fed  into  the  array  and  o  is 
retrieved  from  the  array  through  the  external  input  port  and  external  output  port  respectively  associated 
with  label  fj.  Let  TA(vx)=tj  and  TA(vy)=t2. 

Definition  2.4:  Entrv  Time  for  i  and  Exit  Time  for  o  is  the  time  at  which  i  is  fed  into  and  o  is 

retrieved  from  the  array  respectively.  Consumption  Time  of  «  and  Production  Time  of  o  is  tj  and  t„+d,j 
respectively. 

We  are  now  in  a  position  introduce  the  notion  of  correct  execution  of  homogeneous  graphs. 

Definition  2.S:  G  is  correctly  executed  on  a  linear  array  if  the  following  two  conditions  hold: 

1.  the  mapping  is  syntactically  correct,  and 

2.  for  every  input  value  its  value  at  entry  and  consumption  times  must  be  the  same  and  for 
every  output  value  its  value  at  production  and  exit  times  must  be  the  same. 

Intuitively  condition  (2)  means  that  we  may  be  required  to  maintain  a  value  input  (outputted)  to  (by) 
the  array  constant  as  it  passes  through  some  number  of  processors  inorder  that  it  arrive  unchanged  at  a 
processor  (external  output  port)  that  will  use  it  (from  which  it  will  be  retrieved). 


3.  Syntactic  Characterization 

In  this  section  we  identify  the  structure  of  homogeneous  graphs  for  which  there  exist  syntactically 
correct  mappings.  For  notational  simplicity  we  will  be  using  the  following  conventions. 

1.  Computation  vertices  will  sometimes  be  referred  to  simply  as  "vertices*. 

2.  A  pair  of  vertices  will  always  refer  to  a  distinct  pair  of  computation  vertices  unless  specified 
otherwise. 

3.  A  path  will  always  refer  to  a  undirected  path  between  any  pair  of  computation  vertices.  A 
path  will  always  comprise  of  a  sequence  of  distinct  vertices  unless  it  is  a  cycle  in  which  case 
the  first  and  last  vertices  are  the  same. 

4.  In  any  connected  subgraph  there  exists  a  path  between  every  pair  of  vertices  in  the  subgraph 
through  edges  in  the  subgraph. 

•5.  A  maximally  connected  subgraph  (that  is,  if  there  exists  a  path  between  any  pair  of  vertices 
such  that  one  of  them  is  in  the  subgraph  then  the  other  must  also  be  in  the  same  subgraph) 
will  be  referred  to  as  a  connected  component. 

6.  A  syntactically  correct  mapping  will  sometimes  be  referred  to  simply  as  'correct  mapping*. 

We  now  identify  the  relevant  structural  elements  of  G. 

Definition  3.1:  For  any  label  fj  in  G,  a  major  path  labelled  fj  is  a  directed  path  from  a  source  vertex 
v  to  a  sink  vertex  v  such  that  the  label  of  v  v  and  all  the  edges  in  the  path  is  /j. 

The  path  label  of  a  major  path  is  the  label  of  the  edges  in  the  path. 

Definition  3.2:  Two  major  paths  are  identical  iff,  ignoring  the  source  and  sink  vertices  in  them,  the 
two  directed  paths  are  the  same. 

For  any  label  /j,  let  Ejj={major  paths  having  the  same  path  label  fj } .  Not  every  Ejj  is  relevant  for  a 
syntactic  characterization  of  homogeneous  graphs.  Consequently,  we  divide  the  labels  of  G  into  three 
classes: 

1.  L,*{fj  |  there  exists  a  pair  of  computation  vertices  vx  and  vy  and  a  directed  edge 
e=<vx,vy>  whose  label  is  fj.  Besides  for  any  fi  and  fj  in  L,  there  exists  a  major  path  in  E^ 
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that  is  not  identical  to  any  major  path  in  E^.}  The  major  paths  with  these  labels  are  relevant 
for  structural  characterization  of  correctly  mappable  graphs. 

2.  Let  L0={  fj  I  there  exists  a  pair  of  computation  vertices  and  a  directed  edge  e=<v  v  > 
whose  label  is  Ij.  Besides,  if  ij  is  in  L,  then  there  exists  an  ii  in  such  that  for  every  major 
path  in  E.j  there  is  an  identical  major  path  in  E(i  }  Given  the  major  paths  associated  with  the 
labels  in  Lj,  the  major  paths  associated  with  those  in  this  class  are  redundant  for  structural 
characterization. 

3.  Lj={/j  |  there  exists  no  pair  of  computation  vertices  vx  and  vy  such  that  there  is  a  directed 
edge  e=<vx,vy>  whose  label  is  /j  }. 

Consider  the  graph  in  Figure  2.2  again.  The  solid  and  dashed  horizontal  edges  are  labelled  /I  and  12 
respectively.  The  vertical  and  oblique  edges  are  labelled  13  and  14  respectively.  Lj={Il,  13},  L„={!2}  and 

LS={M}- 

Henceforth,  throughout  the  rest  of  this  paper,  labels  will  be  assumed  to  be  in  Lj  unless  explicitly 
mentioned  otherwise. 

Definition  3.3:  A  minimally  labelled  connected  component  SG  of  G  is  a  3-tuple  <VSG,  ESG,  LSG> 
where  VSGCV,  ESGCE,  LsgCLj  and  VGCVSG  (that  is,  all  the  computation  vertices  in  G  are  contained 
in  VSG).  Besides,  for  any  Ij€LSG  if  all  the  edges  labelled  ij  in  ESG  are  removed  then  SG  is  disconnected. 

[Note:  Unlike  a  minimally  labelled  connected  component,  a  connected  component  need  not  include  all 
the  computation  vertices.] 

We  will  henceforth  refer  to  LSG  as  the  minimal  label  set  of  a  graph  G. 

We  have  now  developed  the  appropriate  formal  machinery  to  undertake  a  systematic  analysis  of  the 
structure  of  program  graphs  3nd  we  begin  by  examining  grr.phs  that  have  exactly  one  label  in  their 
minimal  label  set.  In  particular  let  Lsg={/jz}.  This  means  that  there  exists  a  path  between  every  pair  of 

computation  vertices  through  edges  labelled  Ij:.  G  is  a  homogeneous  graph  and  hence  there  is  only  one 
such  pair  of  incident  and  outgoing  edge  labelled  In  in  any  computation  vertex.  Consequently,  there  exists 
only  one  major  path  labelled  In  in  G  and  the  path  labels  of  all  other  major  paths  are  either  in  L,  or  in  L}. 
Figure  3.1  illustrates  such  a  graph. 


Figwr*  3.| 


In  Figure  3.1  the  solid  and  and  dotted  horizontal  edges  are  labelled  11  and  12  respectively.  The  vertical 
edges  are  labelled  13.  Lj={ll},  L2={12}  and  L3=*{13}.  Mapping  such  a  graph  is  straightforward. 

3.1.  9  Graphn 

We  next  examine  graphs  which  are  comprised  of  two  labels  in  their  minimal  label  set.  We  denote 
the  class  of  such  graphs  as  9  graphs.  ©  is  a  large  class  that  includes  homogeneous  program  graphs  for 
important  computational  problems  like  sorting,  convolution,  vector  multiplication  of  band  matrices? 
pattern  matching,  priority  queue,  linear  recurrence,  filtering,  etc. 

In  particular,  let  Lsg={1ji,  Iv).  G£9  signifies  that  there  is  a  path  between  any  pair  of  computation 

vertices  in  G  through  edges  that  are  labelled  n  or  v.  The  structure  imposed  on  SG  by  any  corr»ct 
mapping  is  elegantly  formalized  below. 

Definition  3.4:  Let  Ij  and  I„  be  two  sequences  of  integers  such  that  the  sequences  in  I,  and  I.2  range 
from  0  to  hj  and  0  to  h.,  respectively  and  let  SCI  X  I.  Then,  SG  is  a  Mesh  Graph  iff  there  exists  a 
one-one  function  F :VG— >S  such  that  the  following  property  holds.  Let  F<,  and  F!v  be  the  projection 


v.-.  ,v 
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functions  of  F,  that  is,  for  3ny  vx  in  VG,  if  F(vx)=*<m,n>  then  F^(vx)=m  and  F(l/(vx)=n.  For  any  vx 
and  vy  in  VG,  there  exists  a  directed  path  from  vx  to  vy  in  a  major  path  whose  path  label  is  In  such  that 
th*  distance  from  vx  to  vy  in  this  directed  path  is  d  iff  Fi;j(vy)=F;;t(vx)+d  and  Fit,(vy)=Fit,(vx).  A  similar 
condition  holds  for  a  major  path  whose  path  label  is  lu. 

Henceforth  we  will  denote  Fi/4(vx)  and  F^vJ  as  xlfl  and  xlv  respectively. 

Figure  3.2  is  an  example  of  a  Mesh  Graph  wherein  the  horizontal  and  vertical  major  paths  are  labelled 
In  and  It/  respectively. 


H gurt  3-2 


We  relate  the  structure  of  SG  to  the  existence  of  a  syntactically  correct  mapping  in  the  following 
Theorem. 

Theorem  3.1:  If  there  exists  a  syntactically  correct  mapping  for  G  then  SG  must  be  a  Mesh  Graph. 
ProofcSee  Appendix 

□ 

When  G  is  finally  mapped  onto  a  linear  array  the  computation  vertices  in  G  may  be  partitioned  into 
sets  that  comprise  vertices  which  are  mapped  onto  the  same  physical  processor.  As  we  will  see  later  on 
this  is  useful  in  expressing  the  structure  of  correctly  m3ppable  graphs  in  a  simple  way.  To  formalize  this 
partitioning  it  is  useful  to  define  a  Diagonalization  of  the  Mesh  Graph  SG  as  follows. 

Definition  3.5:  Let  w=<wl,w2>e{<l,l>,  <i,-l>,  <1,0>,  <0,1>}.  A  Diagonalization  of  SG  is  a 
pair  <D,w>  with  the  following  properties. 

1.  D={D1,D2,  ,.,Dk}  is  a  family  of  ordered  sets  of  computation  vertices  and  DjUD2U  .UDk=VG. 

2.  For  any  Dp  in  D,  if  vx  and  vy  are  in  Dp  then  WjX^+WjXj^Wjy^+w^. 

3.  Let  rD  denote  the  indexing  function  associated  with  the  ordered  set  D.  For  any  pair  of  Dp  and 
Dq  in  D,  if  vx  and  vy  are  in  Dp  and  Dq  respectively  then  TD(Dp)  <  r0(Dq)  iff  WjX^+WjX^  < 

wi7( 

Henceforth,  we  will  refer  to  D  as  the  set  of  Main  Diagonals  and  to  w  as  the  Main  Diagonalization 
Factor.  We  will  assume  that  the  indices  assigned  to  the  diagonals  in  D  range  from  1  to  |D|  and  if  Dp  is  a 

diagonal  in  D  then  7"D(Dp)=v,  that  is,  the  index  of  Dp  in  the  ordering  is  p.  We  use  the  ordering  of  the 
diagonals  in  D  to  define  an  adjacency  relation  imposed  on  them  by  labelled  edges. 

Definition  3.3:  Let  Dp  and  Dq  be  in  D.  Dp  a(j  Dq  (read  ’Dp  is  related  by  afj  to  Dq’)  iff  there  exists  a 
computation  vertex  vx  in  Dp  and  another  computation  vertex  vy  in  Dq  and  a  directed  edge  e=<vx.vy> 
whose  label  is  Ij.  — 

Definition  3.7:  a,j  is  consistent  with  respect  to  iff  3  a  constant  m;j  such  that  VDp€D  and 

VDq€D,  if  Dp  a,j  Dq  then  3rD(Dq)=rDjDp)H-m,j. 

We  will  call  m,j  the  consistency  constant  of  a(j.  Let  SD»{a,j  |  fj€Lj  and  a,j  is  the  adjacency  relation  on 
D  imposed  by  edges  labelled  fj  }. 

It  is  useful  to  define  the  se*  Dc  of  Coi  plementarv  Diagonals  that  is  obtained  by  diagonalizing  SG  by  its 
Complementary  Diagonaliz  ion  Far*  _  wc  where  wc*<0.1>  when  w£{<1,1>,  <  1 .0 > }  and 

we»<l,0>  when  w*<0.1^. 
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Let  rDf  denote  the  indexing  function  associated  with  Dc  and  SDe={b(.  |  (j6L,  and  b;-  is  the  adjacency 

relation  on  Dc  imposed  by  edges  labelled  (j  }.  Herein  also  we  will  assume  that  the  index  of  the 
complementary  diagonals  in  Dc  ranges  from  1  to  |Dc|  and  if  Dcp  is  a  complementary  diagonal  in  Dc  then 

its  index  is  p.  Consistency  of  b^  with  respect  to  is  defined  similar  to  a,j.  Let  c(j  denote  the  consistency 
constant  of  . 

Consider  Figure  3.2  again.  Let  w=<l,-l>  and  so  wc=<0,l>  Then  the  set  of  main  digonals  D={Dr 
D„  D3.  D4}  is  comprised  of  four  diagonals  where  Dj={v8},  D2={v3},  D3={vt,  v4)  and  D4={v„,  v.} 
The  set  of  complementary  diagonals  Dc={Dc4,  Dc2,  Dc3}  is  comprised  of  three  diagonals  where  Dcj=b{v 
v2},  Dc2»{v3,  v4,  v.}  and  Dc3={v8}. 

Let  vx  and  vy  be  two  vertices  in  the  main  diagonals  Dp  and  Dq  respectively  and  complementary 
diagonals  Dcs  and  Dcf  respectively.  Then  we  will  denote  the  difference  in  indices  of  Dq  and  Dp  which  is  q-p 
as  AD(vx,vy).  'Ve  will  also  denote  the  difference  in  indices  of  Dcf  and  Dcs  which  is  r-s  as  ADc(vx,vy). 

We  next  define  two  classes  of  graphs  0jC0  and  ©2C0  where: 

0j={G£0  |  SG  is  a  Mesh  Graph  and  the  main  diagonalization  factor  w  of  SG  is  one  of  {<1.-1>, 
<0,1>,  <1.0>}}  and 

0o={G€0  |  SG  is  a  Mesh  Graph  and  the  main  diagonalization  factor  w  of  SG  is  <  1,1> }. 

We  provide  a  complete  syntactic  characterization  of  program  graphs  in  04  which  have  syntactically 

correct  mappings  in  the  following  Theorem.  Before  doing  so  we  introduce  the  notion  of  transitive  edges 
which  is  needed  in  the  proof  sketch  of  the  Theorem. 

Definition  3.8:  Let  e=<v  v  >  be  a  directed  edge  from  vertex  v  to  vertex  vv.  Then  e  is  a 

transitive  edge  iff  there  exists  a  vertex  and  edges  em=»<vx,vz>  and  en='<vi,vy>. 

Theorem  3.2:  Let  G€0r  There  exists  a  syntactically  correct  mapping  for  G  if  and  only  if  there  exists 
a  pair  <D,Dc>  such  that  each  of  the  following  conditions  is  satisfied: 

1.  Every  relation  a(j€SD  must  be  consistent  with  respect  to  TD  and  its  consistency  constant  m,j  is 
one  of  {1,-1,0}. 

2.  Every  relation  b,j€SD<;  must  be  consistent  with  respect  to  *w 

3.  Let  \x  and  vy  be  any  two  computation  vertices.  For  any  label  fj  if 

ctjAD(vxlvy)=mjjADc(vJt,vy.)  then  there  must  be  a  major  path  labelled  Ij  passing  through  vx 
and  vy. 

Intuitively,  condition  (1)  ensures  that  a  data  stream  is  unidirectional  and  communication  takes  place 
only  between  adjacent  processors  while  condition  (2)  ensures  that  a  data  stream  moves  at  constant 
velocity  and  condition  (3)  ensures  that  no  two  values  appear  simultaneously  at  the  input  port  of  any 
processor. 

We  sketch  the  construction  used  in  the  sufficiency  proof  as  this  construction  is  used  to  illustrate 
synthesis  of  linear-array  algorithms  later  on. 

Proof:  (Only  IQ:  See  Appendix  for  details. 

(If  Part):  Let  D={Dj,  D2,  Da}  be  the  set  of  main  diagonals  where  i  denotes  the  index  of  any  Dj6D. 
Construct  a  linear  array  LAr  with  |.\j=n.  Now  construct  a  mapping  through  the  following  steps. 

1.  Choose  two-phase  clocking  if  there  exists  a  transitive  edge  labelled  Ij  such  that  m(j=0  or  else 
choose  a  single-phase  clocking  scheme. 

2.  Let  Dq  be  any  diagonal  in  D  and  let  vx  be  any  computation  vertex  in  Dq.  Then,  let  PA(vJ=q. 

This  assigns  computation  vertices  to  processors. 

3.  Next  fix  the  neighborhood  constant  n,j  and  delay  constant  d.j  for  every  label  !j  in  Lr  Let 
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n,.=m(j  Let  da  and  db  be  two  constants  which  we  will  be  using  in  the  construction  of  the 
delays  for  the  labels  in  L(.  If  the  main  diagonalization  factor  w  is  <I,-1>  or  there  exists  a 
transitive  edge  labelled  lj  such  that  m;j=0  then  let  da=2  else  let  d1=l.  Let  cmiQ  be  the 
minimum  of  all  consistency  constants  among  all  the  relations  in  SDc.  If  cmin>0  then  set  db=l 
eke  set  db=l+|cmJdfc.  Let 

4.  Next  construct  the  neighborhood  and  delay  constant  for  the  labels  in  L„.  By  definition  of  L„, 
if  there  exists  a  label  lj  in  L0  then  there  must  exist  some  label  l'\  in  Lj  such  that  for  every 
major  path  in  E;j  there  is  an  identical  major  path  in  Ef;.  Hence  let  n f j = n 3nd  d,j=d(i. 

5.  For  every  lj  in  L3,  let  the  neighborhood  relation  imposed  by  label  lj  on  processors  in  N  be 
empty  and  hence  no  processor's  output  port  labelled  lj  is  connected  to  the  input  port  labelled 
lj  of  any  processor. 

6.  Construct  the  function  TA  which  assigns  computation  vertices  to  time  steps.  Let  v  be  the 
computation  vertex  which  is  in  DjGD  and  DcjSDc.  Let  TA(vs)=t0.  Let  vx  be  any 
computation  vertex  in  Dp€D  and  Dc^Dc.  Then,  let  TA(vx)=t0-r(q-l)da+(p-l)db. 

Step  1  to  step  6  described  above  completes  the  construction  of  a  correct  mapping.  Refer  Appendix  to 
verify  that  the  mapping  is  correct. 

□ 

The  three  conditions  of  Theorem  3.2  are  necessary  but  not  sufficient  for  the  existence  of  syntactically 
correct  mappings  for  graphs  in  9„.  However  in  the  next  corollary  we  show  that  in  certain  cases  it  is  both 

necessary  and  sufficient.  Let  G€9„  and  let  «i»)- 

Corollary  3.1s  Vc^gC,  if  c,j>0  or  Vc^SC,  if  c^<0  then  there  exists  a  syntactically  correct  mapping 
for  G  if  and  only  if  the  three  conditions  in  Theorem  3.2  are  satisfied. 

Proof:  Similar  to  Theorem  3.2  except  in  the  construction  of  the  expressions  for  the  delays.  If  c(j->0 
then  set  da='2,  db=l,  dJ#|=l  and  d^=3.  If  <^<0  then  set  da=-2,  db=3,  d^— 3  and  d^*=l.  In  the 
Appendix  it  is  shown  that  this  construction  yields  dfj>0. 

□ 

The  sufficiency  proof  of  Theorem  3.2  provides  a  methodology  to  synthesize  linear-array  algorithms  for 
graphs  in  O.  The  construction  used  in  the  Theorem  maps  a  program  graph  correctly.  However,  very  often, 
to  ensure  its  correct  execution  we  need  to  use  some  property  of  the  function  represented  by  the 
computation  vertices  in  the  graph.  The  structure  of  graphs  that  can  be  executed  without  using  such 
knowledge  is  characterized  in  (9). 

We  now  apply  the  results  described  above  to  synthesize  linear-array  algorithms  for  computing  the  vector 
multiplication  of  band  matrices,  sorting  and  convolution. 

Example  3.1:  Consider  multiplication  of  a  Band  Matrix  M  by  a  Vector  X  as  shown  below. 


yl 

•n 

•31  *33  *33 

»3 

m 

•31  *33  *33  *34 

»4 

•43  *43  *44 
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Figure  3  3  is  a  program  graph  representing  this  computation. 


In  Figure  3.3  v-  denotes  a  com*  utation  vertex.  The  horizontal,  vertical  and  oblique  edges  are  labelled  II, 
12  and  13  respectively.  Let  V  denote  the  function  represented  by  any  computation  vertex  in  the  graph.  'P 
is  a  3-ary  function  such  that  for  any  a,  b  and  c,  !P<a,b,c>=<a+bc,b,c>.  Let  lP3,  ^3  be  the  three 
projections  of  that  is,  $'1<a,b,c>=a+bc,  >f,<><a,b,c>s=b  and  ^3<a,b,c>=*c.  If  a,  b  and  c  are  the 
input  values  represented  by  the  horizontal,  vertical  and  oblique  input  edges  of  v-  then  the  output  values 
represented  by  the  outgoing  horozontal,  vertical  and  oblique  edges  of  v~  are  ^j<a,b,c>,  ft,<a.b,c>  and 

^3<a,b,c>  respectively.  The  input  value  represented  by  every  horizontal  source  vertex  is  initialized  to  0. 
Let  E^^horizontal  major  paths},  Era={vertical  major  paths)  and  Ei3={oblique  major  paths}.  It  can  be 

seen  that  Lj  =  {/1,/2},  L,=  {0}  and  L3={13}. 

Let  SG  be  a  connected  component  shown  in  Figure  3.4  that  is  obtained  by  removing  all  the  edges 
labelled  13  and  source  and  sink  vertices  labelled  13. 


For  porposes  of  clarity  SG  has  been  drawn  without  the  source  and  sink  vertices.  It  can  be  easily  verified 
that  the  program  graph  in  Figure  3.3  is  in  0  as  SG  is  a  minimally  labelled  connected  component 
comprised  of  LSG*s{il,  12}.  Now  diagonalize  SG  with  w=<l.-l>  to  form  the  set  of  main  diagonals 


D.  It  can  be  verified  that  D**{DrD,„D3,D4}  is  comprised  of  four  diagonals  where  —  { v31  ,v42.v53,v84 } . 

D2HV21>V32’V43-VS^VSS}-  ^3“{VU’V22’V33'V44,V55}  an<^  ^4!={V12,V23,V34,V45}- 
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Next  diagonalize  SG  with  wc=<0,l>  to  form  the  set  Dc  of  complementary  diagonals.  It  can  be 
verified  that  Dc={DcrDc2,Dc3.Dc4,Dcs,Dc8}  is  comprised  of  six  diagonals  where  Dcj^v^Vj.,}, 

^C2~{V21'V22,V23^’  ^c3*{V31'V32’V33’V3-»}’  ^C4==  i  V  42’ V -43’ V4-4  ’ V  4S  ^  ’  ®C5={  V53’V54,V5s}  aQ<^ 

Dc8={V84-V6S}- 


In  Figure  3.4  all  the  computation  vertices  belonging  to  the  same  diagonal  in  D  lie  on  the  same  dashed 
line.  Similarly  all  the  computation  vertices  belonging  to  the  same  diagonal  in  Dc  lie  on  one  horizontal 
major  path. 


Now  SD={ajj,aj0},  SDc={bil,b;2}  and  m(1=l,  mi2=-l,  c;i=0  and  c(2=l.  It  can  be  seen  that  this  graph 
satisfies  Theorem  3.2. 


Next,  using  the  construction  in  Theorem  3.2  we  synthesize  the  linear-array  algorithm  in  [5].  |D|=4  and 
hence  the  linear  array  has  4  processors  indexed  from  1  to  4.  m;i^0  and  m;,^0  and  hence  use  single-phase 

clocking.  Each  processor  is  comprised  of  3  pairs  of  input/output  ports  labelled  11,  12  and  13  respectively. 
The  neighborhood  relation  plz  is  empty. 


Let  sij,  si*  and  si®  denote  the  inputs  at  the  input  ports  labelled  fl,  12  and  13  respectively  of  processor  s  at 
time  t  and  let  so|,  so*  and  soj  denote  the  outputs  computed  by  s  at  time  t.  Then  soJ=sif+si*si®,  so*=si* 
and  so®=si®. 

The  computation  vertices  in  Dj.Do.Dj  and  D4  are  mapped  onto  processors  1,2,3  and  4  respectively.  From 
the  construction  of  Theorem  3.2,  we  obtain  nil=«l,ni2=«-l,dj1  =  l  and  dj2=l.  The  resulting  mapped  graph 
is  shown  in  Figure  3.5. 
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The  time  at  which  a  computation  vertex  is  mapped  is  indicated  by  the  side  of  the  vertex  in  Figure  3  5. 
For  instance,  the  computation  vertex  on  D3  and  Dc2  is  mapped  at  time  t+2.  For  correctness  of  execution 

we  must  ensure  the  invariance  of  the  two  input  values  ihj  and  ih0  at  their  consumption  and  entry  times 
and  the  invariance  of  the  two  output  values  ohs  and  oh8  at  their  exit  and  production  times  The 
consumption  times  for  ihj  and  ih2  are  t  and  t+1  respectively  Table  3.1  gives  the  times  at  which  ihj 
appears  at  the  input  port  labelled  11  of  processors  1  and  2  and  ih0  appears  at  the  input  port  labelled  11  of 
processor  1. 
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Any  element  pumped  into  I;i  or  IJ2  travels  at  the  rate  of  1  processor/cycle  as  l/da=l/d(,=  l.  Consider 
some  row  of  Table  3.1.  say  2.  The  entry  in  column  1  indicates  that  ih,  appears  at  the  input  port  labelled 
11  of  processor  1  at  time  t.  Now  (P’j  is  such  that  for  any  b,  4rl  <a,b,0>=a+b0=a  and  hence  by  pumping 
0  into  the  input  port  labelled  13  of  processor  1  at  t  invariance  of  ih2  at  its  entry  and  consumption  time  can 

be  maintained.  Similarly  by  pumping  0  into  the  input  ports  labelled  13  of  processor  1  at  t*2  and  processor 
2  at  t-1  invariance  of  ihj  at  its  entry  and  consumption  times  can  be  maintained. 

The  production  times  for  oh5  3nd  oh8  are  t+9  and  t+10  respectively.  Table  3.2  gives  the  times  at  which 
oh5  appears  at  the  input  port  labelled  fl  of  processor  4  and  oh4  appears  at  the  input  ports  labelled  ll  of 
processors  3  and  4. 
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The  entries  in  Table  3.2  are  interpreted  in  the  same  way  as  the  entries  in  Table  3.1.  From  Table  3.2  it  is 
seen  that  by  pumping  0  into  the  input  port  labelled  13  of  processor  3  at  t+10  and  processor  4  at  t+9  and 
t+11  invariance  of  ohs  and  oh9  at  their  production  and  exit  times  can  be  maintained. 

Lastly,  as  !#'2<a,b,c>=b  for  any  a  and  any  c,  the  input  value  ivt  and  output  value  ovt  do  not  change 
as  they  travel  through  processors  in  the  array. 


Example  3.2:  We  wish  to  sort  the  set  of  elements  {2,  10,  5,  6}.  A  program  graph  that  performs 

sorting  is  shown  in  Figure  3.6  below. 
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In  Figure  3.6  v-  denotes  a  computation  vertex.  Each  computation  vertex  represents  the  computation  of 

the  minimum  and  maximum  of  the  two  input  elements  denoted  by  the  incoming  horizontal  and  vertical 
edges.  The  outgoing  horizontal  and  vertical  edges  denote  the  minimum  and  maximum  respectively  of  the 
two  input  elements  computed  by  the  computation  vertex.  The  horizontal  edges  are  labelled  1 1  and  the 
source  and  sink  vertices  connected  to  horizontal  edges  are  all  labelled  / 1.  The  vertical  edges  are  labelled 
12  and  the  source  and  sink  vertices  connected  to  vertical  edges  are  all  labelled  12.  The  set  of  horizontal 
source  vertices  is  {ihr  ih„,  ih3,  ih4 }  and  the  set  of  horizontal  sink  vertices  is  {ohr  oh„,  oh3,  oh4}.  Similarly 

the  set  of  vertical  source  vertices  is  {ikr  ik2,  ik3,  ik4}  and  the  set  of  vertical  sink  vertices  is  {okj,  ok*. 
ok3,  ok4}.  The  initial  values  represented  by  the  source  vertices  ikt>  ik2,  ik3  and  ik4  are  2,  10,  6  and  5 
respectively.  The  initial  values  represented  by  the  source  vertices  ihr  ih2,  ih3  and  ih4  are  all  co.  It  can  be 
verified  that  the  final  values  represented  by  the  sink  vertices  ohr  oh2,  oh3  and  oh4  are  2,  5,  6  and  10 
respectively.  We  synthesize  the  algorithm  known  in  literature  as  the  'rebound  sorter'  [l]. 

Let  Ejj  =  {horizontal  paths}  and  E^2={vertical  paths}.  Hence  Lj  =  {/1,  12},  L,=  {ij>}  and  L3=  { C> } .  It 

can  be  verified  that  this  graph  belongs  to  the  class  0  as  the  minimally  labelled  connected  component 
comprised  of  the  two  labels  from  Lj  is  G  itself. 

Form  the  set  of  main  diagonals  D  by  choosing  the  diagonalization  factor  w  to  be  <1,-1>.  It  can  be 
verified  that  D={{vn,v22.v33.v44},  {v12,v23,v34},  {v13,v24},  {vM}}. 

Let  D={Dj,D2.D3.D4}  where  D1={vn-V22-V33-V«}-  D2={v12,v23,v34},  D3={v13,v24}  and  D4={VH}- 
It  can  be  verified  that  the  indices  of  Dj,  D2,  D3  and  D4  are  1,  2,  3  and  4  respectively  in  the  orderiing  of 
D. 

D  is  obtained  by  diagonalizing  with  <  1,-1  >  and  hence  form  Dc  by  choosing  its  diagonalization  factor 
wc  to  be  <0,1>.  It  can  be  verified  that  Dc={{vn,v12,v13,v14},  {v22,v23,v24},  {v33,v31},  {v44}}.  Let 
Dc={Dc1,Dc2,Dc3.Dc4}  where  Dc1  =  {vu,vl3,v13,v14}1  Dc2={v22,v23,v24},  Dc3={vJ3,v34}  and 
Dc4={v44}.  It  can  be  verified  that  the  indices  of  Dct,  Dc2,  Dc3  and  Dc4  are  1,  2.  3  and  4  respectively  in 
the  ordering  of  Dc. 


In  Figure  3.7  above  all  the  computation  vertices  belonging  to  a  single  diagonal  in  D  lie  on  the  same 
dashed  line.  Similarly,  all  the  computation  vertices  belonging  to  a  single  diagonal  in  Dc  lie  on  one 
horizontal  major  path. 

Now,  SD={a!l,a!2}  and  SDe={b,j,bI2}.  a(1  and  a,2  are  consistent  with  respect  to  rD  and  bj,  and  bi2  are 
consistent  with  respect  to  Hence  conditions  1  and  2  of  Theorem  3.2  are  satisfied.  It  can  be  seen  that 
111^=1.  mj„=*-l.  ^=0  and  c(2=l.  It  can  be  also  verified  that  condition  3  of  Theorem  3.2  is  satisfied  by 
the  sorting  graph. 

Using  the  construction  described  in  Theorem  3.2  we  map  the  sorting  graph.  |D|=*4  and  hence  the  linear 


*  4 


a: 

Si 


k-*. 


-.  • 
>v 


U--. 


14 


array  has  4  processors  indexed  from  1  to  4.  m(1^0  and  m(n^0  and  hence  use  single-phase  clocking  Each 
processor  is  comprised  of  2  pairs  of  input/output  ports  labelled  II  and  12  respectively 

Let  sif  and  si*  denote  the  inputs  at  the  input  ports  labelled  II  and  12  respectively  of  processor  s  at  time  t 
and  let  so*  and  so*  denote  the  outputs  computed  by  s  at  time  t  Then,  soJ=Min(si‘,  si*)  and 
so*=Max(siJ,  si*). 

The  computation  vertices  in  Dt,  D2,  Ds  and  D4  are  mapped  onto  processors  1.  2,  3  and  4  respectively 

1,  n,2=-l,  d/i=*2,  d;h=*l,  d/t  ==  1  and  d/2=*l  The 


Using  the  construction  in  Theorem  3.2  we  obtain  n^  = 
resulting  mapped  graph  is  shown  in  Figure  3.8. 


Figure  3.3 


Lastly,  for  any  j,  l<j<4  we  must  ensure: 

1.  the  invariance  of  input  values,  that  is, 

a.  if  PA(vjj)=s  and  s>l  then  the  value  represented  by  ihj  must  not  change  as  it  travels 
from  Ij  to  the  input  port  labelled  II  of  s, 

b.  if  PA(Vjj)=s  and  s<4  then  the  value  represented  by  ikj  must  not  change  as  it  travels 
from  I2  to  the  input  port  labelled  12  of  s. 

2.  the  invariance  of  output  values,  that  is, 

a.  if  PA(vjj)=s  and  s>l  then  we  must  ensure  that  the  value  represented  by  okj  remains 
invariant  as  it  travels  from  the  output  port  labelled  12  of  s  to  02, 

b.  if  PA(vj4)=*s  and  s<4  then  we  must  ensure  that  the  value  represented  by  ohj  remains 
invariant  as  it  travels  from  the  output  port  labelled  11  of  s  to  O,. 

We  need  to  use  some  semantic  information  of  the  minimum  (Min)  and  maximum  (Max)  functions 
computed  by  a  processor  in  the  array  in  every  cycle.  We  will  use  the  property  that  Min(.x,-rc)=-cc  and 
Max|x,cc)=cc  in  our  synthesis. 

In  the  mapping  we  observe  that  for  any  j,  l<j<4,  PA(v..)=«l  and  hence  we  need  not  consider  (la)  and 
(2a). 


y>: -  ^  ;  ^ 


An  element  pumped  into  l;  travels  at  the  rate  of  1  proceesor  /  cycle  (1  /  djo).  Hence,  if  PA(Vjj|=s 
then  we  can  compute  the  times  at  which  the  input  value  represented  by  ikj  appears  at  the  input  ports 
labelled  12  of  processors  4,3,..,s-l.  This  is  tabulated  in  Table  3  3.  Similarly,  if  PA(vjJ=s  then  we  can 
compute  the  times  at  which  the  output  value  represented  by  ohj  appears  at  the  input  ports  labelled  (1  of 
processors  s+l.s+2, ...4.  This  is  tabulated  in  Table  3.4. 


TABLE  3-3 


TABLE  3.4 


Consider  some  row,  say  row  2,  in  Table  3.3  and  Table  3.4.  The  entries  t,t-l  in  columns  3  and  4  of  Table 
3.3  denote  the  times  at  which  the  input  value  represented  by  ik*  appears  at  the  input  port  labelled  12  of 

processors  3  and  4  respectively.  Similarly,  the  entries  t+6  and  t+7  in  columns  3  and  4  of  Table  3.4 
denote  the  times  at  which  the  output  value  represented  by  oh3  appears  at  the  input  port  labelled  11  of 

processors  3  and  4  respectively. 

Now  consider  row  2  of  Table  3.3  and  Table  3.4  again.  If  -oc  appears  at  the  input  port  labelled  11  of 
processors  3  and  4  at  times  t  and  t-1  respectively  then  the  input  value  represented  by  ik„  is  preserved. 

Similarly,  if  oc  appears  at  the  input  port  labelled  12  of  processors  3  and  4  at  times  t+6  and  t+7 
respectively  then  the  output  values  represented  by  oh}  is  preserved. 

For  every  entry  in  Table  3.3  we  compute  the  times  at  which  -oc  must  be  pumped  into  Ix  and  this  is 

tabulated  in  Table  3.5.  Similarly,  for  every  entry  in  Table  3.4  we  compute  the  times  at  which  00  must  be 
pumped  into  L,  and  this  is  tabulated  in  Table  3.6. 

12^4  1234 
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t 

f+  1 

Consider  some  row,  say  row  2,  in  Table  3.5  and  Table  3.6.  The  entry  t-4  in  column  3  of  Table  3.5 
indicates  that  for  -00  to  appear  at  the  input  port  labelled  1 1  of  processor  3  at  time  t-2,  it  must  be  pumped 
into  I,  at  time  t-4.  Similarly,  the  entry  t+5  in  column  3  of  Table  4  indicates  that  for  cc  to  appear  at  the 
input  port  labelled  12  of  processor  3  at  time  t+6,  it  must  be  pumped  into  l„  at  time  t+5. 

From  Table  3.5  we  observe  that  it  suffices  to  pump  -cc  into  Ij  at  times  t-6,  t-4  and  t-2.  Similarly,  from 
Table  3.6  we  observe  that  it  suffices  to  pump  00  into  I,  at  times  t+5,  t+7  and  t+9. 

Example  3.3:  Consider  the  convolution  problem  defined  as  follows. 

Given  the  sequence  of  weights  {wr  w2,  ...  wfc}  and  the  input  sequence  {x,,  x2.  ...  xn}  compute  the 

k 

output  sequence  {.Vj,  y2,  ...  yQ+lk}  defined  by  y.am  r^w.xi+..j. 


We  illustrate  the  convolution  problem  on  n=o  and  k=3.  The  computation  of  the  convolution  problem 
for  n=5  and  k=3  is  represented  by  the  program  graph  of  Figure  3.9. 


Figure  3*9 


In  Figure  3.9,  Vi  and  Vj  |l<i,j<3,  v;j  represents  a  computation  vertex.  The  horizontal,  vertical  and 
oblique  edges  are  labelled  11,  12  and  13  respectively. 

Let  P  denote  the  function  represented  by  any  computation  vertex  in  Figure  3.9.  ^  is  a  3-ary  function 
such  that  for  any  a,  b  and  c,  lP<a,b,c>»<a+bc,b,c>.  Let  P2,  #3  be  the  three  projections  of  'P,  that 

is,  tf'j<a,b.c>=a+bc,  tfc,<a,b,c>=b  and  iP3<a,b,c>=c.  If  a,  b  and  c  are  the  input  values  represented 
by  the  horizontal,  vertical  and  oblique  input  edges  of  v-  then  the  output  values  represented  by  the 
outgoing  horozontal,  vertical  and  oblique  edges  of  v-  are  lf’1<a(b,c>1  !fc,<a,b,c>  and  !^3<a,b,c> 
respectively.  Vp  |  l<p<5,  Vq  |  I<q<3  and  Vr  |  l<r<3,  let  the  input  values  represented  by  isp,  ivq  and 

ihr  be  xp)  wq  and  0  respectively.  It  can  then  be  verified  that  the  output  values  represented  by  ohf  is 

q£,Vr+q-r 

Let  Efl** {horizontal  major  paths},  EJ2={vertical  major  paths}  and  EJ3={oblique  major  paths}.  It  can 
be  seen  that  Lj={ll,12,13},  L2={0}  and  L3={0}. 


Let  SG  be  the  connected  component  shown  in  Figure  3.10  that  is  obtained  by  removing  all  the  edges 
labelled  13  and  source  and  sink  vertices  labelled  13. 


Flour*  3.10 

For  purposes  of  clarity  again  SG  has  been  drawn  without  the  source  3nd  sink  vertices.  It  can  be  seen  that 
the  program  graph  in  Figure  3.9  is  in  Q  as  SG  is  a  minimally  labelled  connected  component  comprised  of 
two  labels  11  and  12. 


N'ow  diagonalize  SG  with  w=<l,0>  to  form  the  set  D  of  main  diagonals.  It  can  be  verified  that 
D={Dl.D2,D3}  is  comprised  of  three  diagonals  where  D1={v11,v21,v31}>  D2={v12,v25,v32}  and 
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Next  diagonalize  SG  with  wc=<0,l>  to  form  the  set  Dc  of  complementary  diagonals.  It  can  be  verified 
that  Dc={Dcj,Dc2,Dc3}  is  also  comprised  of  three  diagonals  where  Dc1={v11,vl0,v13}, 

D°2=  {v2rv22,V2jJ '  ^c3=(v31'v32,v33^’ 

In  Figure  3.10  all  the  computation  vertices  belonging  to  a  single  diagonal  in  D  lie  on  the  same  vertical 
major  path.  Similarly,  all  the  vertices  belonging  to  a  single  diagonal  in  Dc  lie  on  the  same  horizontal 
major  path. 

Now  S jj** j ,  Sqc=  {bjj,bj2,b(j}  and  m^- *1,  m^^O ,  m^— *1,  c^=0,  c ^  —  1  and  Cj3:^l.  It  can 
be  verified  that  Theorem  3.2  is  satisfied. 

We  next  synthesize  the  linear-array  algorithm  in  [7],  (D|=3  and  hence  the  linear  array  has  3  processors 
indexed  from  1  to  3.  m^=0  and  there  exist  transitive  edges  labelled  12.  Hence  use  two-phase  clocking. 

Each  processor  is  comprised  of  3  pairs  of  input/output  ports  labelled  11,12  and  13  respectively. 

Let  sij,  si'  and  si*  denote  the  inputs  at  the  input  ports  labelled  11,  12  and  13  respectively  of  processor  s  at 
time  t  and  let  soj,  so*  and  so*  denote  the  outputs  computed  by  s  at  time  t.  Then,  soJ=sij+si'Xsi*, 
so-=si"  and  so*=si*. 

Using  the  construction  in  Theorem  3.2,  we  obtain  n(1=s-l,  nf2=0  and  nf3=*l.  We  also  obtain  dfl  =  1, 
d(>2=2  and  d,3=l.  The  computation  vertices  in  D(,  D2  and  D3  are  mapped  onto  processors  1,2  and  3 
respectively.  The  resulting  mapped  graph  is  shown  in  Figure  3.11. 


Lastly,  we  must  some  semantic  properties  of  $  for  correctness  of  execution.  ft,  and  ft3  are  such  that  for 
any  a.b  and  c,  ft,<a,b,c>=b  and  ft3<a,b,c>=c.  Hence,  the  input/output  value  represented  by  the 
source/sink  vertices  of  any  vertical  or  oblique  major  paths  does  not  change  as  it  travels  through 
processors  in  the  linear  array.  In  Figure  3.11  it  is  seen  that  the  entry  and  consumption  (production  and 
exit)  times  for  every  input  (output)  value  represented  by  every  horizontal  source  (sink)  vertex  are  the 
same. 

Let  t$  be  the  time  when  the  computation  begins.  Clearly  t5<t.  Since  n(o=0  a  register  in  each  processor 
serves  as  the  input/output  port  labelled  12.  Let  rr  r,  and  r3  denote  such  a  register  in  processors  1,  2  and  3 
respectively.  Then  the  input  values  of  ivt,  iv„  and  iv3  which  are  wr  w2  and  w3  respectively  are  preloaded 
into  r,,  r„  and  r3  respectively  before  ts. 

3.2.  Cube  Graphs 


A  natural  generalization  of  the  program  graphs  in  6  are  graphs  whose  minimal  label  set  is 
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comprised  of  more  than  two  labels.  Program  graphs  for  important  problems  like  matrix  multiplication 
lu-decomposition  of  matrices,  operations  on  relations  in  relational  databases  are  examples  of  such  graphs 
A  complete  characterization  of  such  graphs  seems  difficult  and  in  this  section  we  examine  an  important 
subset  of  such  program  graphs  and  provide  a  technique  for  correctly  mapping  such  graphs. 

Let  G=<V,E.Lg>  be  a  program  graph  with  its  label  set  LG={11,12.13}.  Let  I={1,,  I,,  13}  be  a  family 
of  sets  of  sequences  of  integers  ranging  from  0  to  hj,  0  to  h,  and  0  to  h3  respectively.  Let  0CIjXl2Xl3 

Definition  3.9:  G  is  a  Cube  Graph  iff  there  exists  a  one-one  function  F:VG  — >  B  where: 

1.  VG  is  the  set  of  computation  vertices  in  G. 

2.  Let  F[v  F[n  and  Fl3  be  three  projection  functions  of  F,  that  is,  if  F(vx)=<c,,c2,c3>  then 
F(1(vx)=Cj,  Fi2(vx)=c2  and  F(3(vx)=c3.  Let  vx  and  vy  be  any  two  computation  vertices  in  VG 
Then,  for  any  label  l€LG,  there  exists  a  major  path  labelled  1  passing  through  vx  and  vy  such 
that  the  distance  from  vx  to  vy  is  d  iff  Fi(vy)=Fi(vj|)+d  and  Vt6LG-{l},  Ft(vy)s*Ft(vx). 

A  Cube  Graph  is  an  object  in  Euclidean  3-Space  and  we  will  refer  to  the  3  axes  as  llll>,  12ad  and  /3rd 
axes.  ht,  h„  and  h3  are  the  maximum  dimensions  along  11th,  12nd  and  13rd  axes  respectively  and  hj>l, 
h,>l  and  h3>l.  If  vx  is  a  computation  vertex  in  a  Cube  Graph  then  we  will  refer  to  F(l(vx),  F{2(vJ  and 
Fw(vx)  as  Iltb,  /2nd  and  /3rd  coordinate  respectively  and  denote  them  by  xlt,  x(2,  and  xl3  respectively. 

Let  H={l}x  {1,-1}  X  {1,-1}  be  the  cartesian  product  of  the  set  {1,-1}.  Let  w==<w1,  w,,  w3>gH. 

Definition  3.10:  A  Diagonalization  of  a  Cube  Graph  is  a  pair  <D,w>  with  the  following  properties. 

1.  D={D,,  D„  ..,  Dk}  is  a  family  of  ordered  sets  of  computation  vertices  and 
D1UD2U..UDk*VG. 

2.  For  any  Dp  in  D,  if  vx  and  vy  are  in  Dp  then  wl*ii+w2xl2+w3xtt  =  w^^+w^y^+Wj.v^. 

3.  Let  rD  denote  the  indexing  function  associated  with  the  ordered  set  D.  For  any  pair  of  Dp  and 
Dq  in  D,  if  vx  and  vy  are  in  Dp  and  Dq  respectively  then  TD(Dp)<rD(Dq)  iff 
WjXtl  +W2X,2+ w3xts  <  w  1yil+w2yt2+w3y{3. 

We  will  refer  to  w  as  the  Diagonalization  Factor  of  the  Cube  Graph.  Let  wp  denote  the  weight  of  the 
diagonal  Dp  in  D,  that  is,  if  vx  is  a  vertex  in  Dp  then  'wi^I+w2xf2+w3x(3=wp. 

Consecutive  indices  are  assigned  to  the  diagonals  in  D  with  the  diagonal  having  the  least  weight  assigned 
index  1. 

Throughout  the  rest  of  this  section  G  will  refer  to  a  Cube  Graph.  11,  12  and  13  will  refer  to  the  three 
labels  in  its  label  set  LG  and  the  subscript  of  a  diagonal  will  refer  to  its  index,  that  is,  if  Dp  is  a  diagonal 

in  D  then  its  index  is  p. 

[Remark  1:  A  Mesh  Graph  is  a  Cube  Graph  with  |LGj«2,  that  is,  cardinality  of  the  label  set  is  2  and 
Diagonalization  of  a  Cube  Graph  is  a  generalization  of  Diagonalization  of  a  Mesh  Graph. 

Remark  2:  A  minimally  labelled  connected  component  SG  of  a  Cube  Graph  with  LSG=*{11.  12,  13}  is  G 
itself.J 

Let  !€Lg.  Let  MG={MG,,  MG,,  MGj,}  be  the  set  of  connected  components  formed  by  removing  all 
the  edges  labelled  1  and  source  and  sink  vertices  labelled  1  from  G.  The  label  set  for  any  MG;  in  MG  is 

Lemma  3.1:  MG-,  is  a  Mesh  Graph. 

Proof:  Follows  immediately  from  definitions  of  Mesh  and  Cube  Graphs. 

□ 


We  next  combine  the  Mesh  Graphs  in  MG  into  classes  as  follows. 


Let  CG*{CGj,  CGj.  ...  CGa}  be  a  family  of  sets  of  Mesh  Graphs  such  that  CGj={MGq€MG  |  if  vx  is 
a  computation  vertex  in  MGq  then  F;(vx)=i}  (that  is,  Mesh  Graphs  in  CG;  have  the  property  that  the  /th 
coordinate  of  their  computation  vertices  is  i). 

[  Note:  F(  is  F{1  if  1=11  or  F;,  if  1=12  or  F(J  if  1=13.  Aslo  the  ltl1  coordinate  is  11th  coordinate  if  1=11  or 
/2nd  coordinate  if  1=12  or  /3rd  coordinate  if  1=13  J. 

We  next  describe  the  algorithm  to  map  a  Cube  Graph  onto  a  linear  array  .  Let  Ar=<N.LAr,!f\r> 
denote  the  linear  array  onto  which  G  is  mapped.  Without  loss  of  generality,  let  1=13.  So  the  label  set  of 
any  Mesh  Graph  within  any  set  in  CG  is  {11,  12}.  Let  't  denote  the  function  represented  by  a 
computation  vertex  in  G. 

Choose  some  Diagonalization  Factor  w=<wr  w2,  w3>  from  H.  Let  D  be  the  set  of  diagonals  obtained 
for  this  w.  Let  |D|=m.  Choose  the  number  of  processors  in  N  to  be  m,  that  is,  let  |N|=|D|=m.  Let 
^Ar=^  and  LAr=LG.  Let  D={Dj,  D2,  Dm}  denote  the  ordered  set  of  diagonals  in  D  and  let  {1,2, ...m} 
denote  the  sequence  of  processor  numbers  in  N. 

The  algorithm  that  maps  G  onto  Ar  is  explained  in  three  phases.  In  the  first  phase  we  show  how  to 
choose  the  neighborhood  constants  n^,  ni2  and  al3  for  the  labels  fl,  12  and  13.  We  also  show  how  to 

construct  the  function  PA  that  maps  computation  vertices  of  G  onto  processors  in  Ar.  In  the  second  phase 
we  show  how  to  choose  the  delays  da  and  d[s  for  the  labels  11  and  12.  We  also  show  how  to  map  Mesh 

Graphs  in  CG  in  this  phase.  In  the  third  phase  we  show  how  to  determine  the  delay  d(3  for  label  13.  We 

also  show  how  to  construct  the  function  TA  that  maps  computation  vertices  onto  time  steps  by 
composing  the  mappings  of  the  Mesh  Graphs  constructed  in  phase  two. 

Phase  One 

Let  n^j— Wj,  n,2=w2  and  ni3=w3.  For  every  computation  vertex  vx  in  diagonal  D-,  let  PA(vx)=i.  that 
is,  map  the  computation  vertices  in  the  ith  diagonal  onto  processor  i. 

Phase  Two 

1.  set  da=l.  If  n(2=l  then  set  d<2=2  else  set  d,2=l. 

2.  For  every  CG(  do  the  following: 

a.  let  Vj  denote  the  computation  vertex  whose  coordinates  are  <0,0, i> 
will  show  in  phase  three  how  to  determine  t;), 

b.  if  vx  is  a  computation  vertex  in  any  Mesh  Graph  in  CGj,  let  TA(v  )— 

Phase  Three 

We  first  show  how  to  determine  di3. 

1.  if  ntl— nJ2  then 

a.  if  hj-h2+nj3>0  then  choose  di3=h1  +  l+2n(3, 

b.  if  hj-h2+nJ3<0  then  choose  dj3=h2+l+n(3, 

2.  if  n;i^nJ2  then 

a.  if  h.j-hj+njj^O  then  choose  d[3=2h„+l+n(3, 

b.  if  h2-hj+ni3<0  then  choose  di3=2hj+l-n(3. 

Once  d;j  is  determined,  we  compose  the  mapping  of  the  Mesh  Graphs  in  CG|  by  letting  t 1 1 -i- id ,3. 

We  show  that  this  mapping  is  syntactically  correct  in  the  Appendix. 


Let  TA(vj)*»t;  (we 
t;+x,id,i+x,2di2. 


:o 


Phases  one  two  and  three  performs  a  syntactically  correct  mapping  of  a  Cube  Graph  onto  a  linear 
array  However  to  demonstrate  a  correct  execution  of  the  program  represented  by  the  Cube  Graph  some 
semantic  information  about  the  function  represented  by  the  computation  vertex  in  the  Cube  Graph  needs 
to  be  used  as  we  show  in  the  following  examples  wherein  we  use  the  mapping  algorithm  described  above  to 
synthesize  novel  linear  array  algorithms  for  multiplying  matrices  that  we  reported  in  (10)  Thes£ 
algorithms  multiply  two  nxn  matrices  using  0(n)  processors  in  0(n*)  time  steps.  The  processors  used 
require  no  control,  no  addressable  memory  and  the  array  requires  no  loading  and  unloading  circuitry 


Example  3.3:  Consider  multiplication  of  two  matrices  A  and  B  as  shown  below 
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A  program  for  computing  this  multiplication  is  given  by  the  following  recurrence. 


6**l)  =  ^  +  alkb^  l<i.!c<2.  1<3<3. 

=  0 

The  program  graph  in  Figure  3.12  is  a  representation  of  this  program.  In  Figure  3.12.  py  and  qy  denote 
computation  vertices.  The  horizontal,  vertical  and  oblique  incident  edges  of  pjj  are  labelled  11.  12  and  13 
respectively.  Similarly  the  horizontal,  vertical  and  oblique  outgoing  edges  of  q-  are  labelled  il,  12  and  13 
respectively.  If  the  horizontal,  vertical  and  oblique  incident  edges  of  p.y  or  qy  represent  the  values  a,  b  and 
c  respectively  then  the  horizontal,  vertical  and  oblique  outgoing  edges  of  p.y  or  q-  represent  the  values  a,  b 
and  c+ab  respectively.  In  Figure  3.12,  the  oblique  input  edge  incident  on  py  represents  the  value 
which  is  0.  The  oblique  outgoing  edge  from  qy  reresents  the  final  (output)  value  c[?>  of  Cy,  that  is, 


aiibij+ai2b2j 


The  program  graph  in  Figure  3.12  is  a  Cube  Graph  as  illustrated  in  Figure  3.13.  The  Cube  Graph  is 
shown  without  the  source  and  sink  vertices  for  purposes  of  clarity.  The  maximun  dimensions  of  Illb,  12nd 
and  13rd  axes  is  2,  1  and  1  respectively,  that  is,  hj=2,  h2=l  and  h3=l.  We  next  map  this  graph  onto  a 
linear  array  using  the  mapping  algorithm  described  earlier. 

Let  w=<wt,  w,,  w3>  =  <l,  1,  1>.  It  can  be  verified  that  for  this  choice  of  w,  the  set  D  of  diagonals  is 

comprised  of  (D^,  Do*  ^4*  1^$}  where  P2i?  ^11  P22*  ^12'  ^21 

D,={pOJ,  q,3,  q„„  }  and  D5={q23  }.  Since  |D|=5,  the  linear  array  has  5  processors  indexed  from  1  to  5. 
Each  processor  is  comprised  of  3  pairs  of  input/output  ports  labelled  11,  12  and  13  respectively. 


Let  sij.  si?  and  si*  denote  the  inputs  at  the  input  ports  labelled  11,  12  and  13  respectively  of  processor 
indexed  s  at  time  t  and  let  soj,  so*  and  so*  denote  the  outputs  computed  by  s  at  t.  Then,  so*=sif, 
so*=si*  and  so“=ssi^+sit‘sij. 

From  phase  one,  we  obtain  b(1*1,  n(2=l  a“d  -^s0  a^  tbe  computation  vertices  in  D;  are  mapped 

onto  processor  i. 


n(„=l  and  so  from  phase  two,  we  obtain  d^=l  and  d(2=2  as  the  delays  for  il  and  12.  Now  n;i=n;2 
and  hj-ho+n(3>0  and  so  from  phase  three,  we  obtain  d,3=hj  +  l+2n,3=2+ 1+2=5  and  hence  t2=t E  +-o. 
The  composed  mapping  for  the  entire  graph  is  shown  in  Figure  3.15. 
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In  Figure  3.14,  Ij,  I.,  and  I3  are  the  input  ports  labelled  II,  1 2  and  13  respectively  of  processor  1.  0r  02 
and  03  are  the  output  ports  labelled  /l,  12  and  13  respectively  of  processor  5.  These  are  the  ports  of  the 

linear  array  through  which  external  communication  takes  place.  The  elements  of  the  matrices  A,  B  and 
and  C  are  pumped  into  the  array  through  the  ports  Ij,  I2  and  I3  respectively.  The  computed  values  of 

matrix  C  emerge  out  of  the  port  03. 


Lastly,  we  must  show  that: 


1.  for  any  i  and  j,  if  PA(p;j)=s  (i.e.,  if  s  is  the  processor  onto  which  p^  is  mapped)  and  s>l  then 
the  input  value  c[?)  does  not  change  as  it  travels  from  Ij  to  the  input  port  labelled  13  of  s, 

2.  for  any  i  and  j,  if  PA(qy)=s  and  s<5  then  the  ouput  value  c(?)  does  not  change  as  it  travels 
from  the  output  port  labelled  13  of  s  to  03. 

An  element  pumped  into  I3  travels  at  a  velocity  of  0.2  processors/cycle  (l/d;3).  Hence  if  PA(p-lj)=s  then 
we  can  compute  the  times  at  which  the  input  value  c\V  appears  at  the  input  ports  labelled  /3  of  processors 
indexed  1.2....S-1.  Similarly  if  PA(qjj)=s  then  we  can  compute  the  times  at  which  the  output  value  c[?* 
appears  at  the  input  ports  labelled  13  of  s+l,s+2,...5.  This  is  shown  in  Table  3.7.  Consider  some  row  -  say 
row  5  in  Table  3.7.  The  entries  tj-11,  tj-6  and  tj-1  in  columns  1,  2  and  3  denote  the  times  at  which  the 

input  value  c!,3'  appears  at  the  input  port  labelled  13  of  processors  indexed  1,  2  and  3  respectively. 

Consider  row  o  again.  If  the  value  0  appears  on  any  of  the  other  two  input  ports  of  processors  1.  2  and  3 
at  times  tj-11,  t,*6  and  tt-l  then  the  value  represented  by  cL3  is  preserved.  An  element  pumped  into  l, 
travels  at  the  rate  of  1  processor/cycle  (1/djj).  It  can  be  verified  that  if  0  is  pumped  into  Ij  at  times 


t  -11,  tj-7  and  t,-3  then  0  will  appear  at  the  input  ports  labelled  1 1  of  processors  1,  2  and  3  at  times  tj-1 1. 
tj-6  and  tj-1  respectively. 

For  every  entry  in  Table  3.7,  we  compute  the  times  at  which  0  must  be  pumped  into  It  and  this  is 
tabulated  in  Table  3.8.  Consider  some  row  in  Table  3.8,  say  row  6  The  entries  tj-3  and  tj-4  in  columns  1 
and  2  indicate  that  for  0  to  appear  at  the  input  port  labelled  11  of  processors  1  and  2  at  time  tj-3,  0  must 
be  pumped  into  Ij  at  times  tt-3  and  tj-4. 

From  Table  3.8  we  observe  that  it  suffices  to  pump  0  into  It  between  tj-11  and  tj-3  and  also  between 


tj+8  and  tj  +  16. 


Table  3-7 


Table  3.8 
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Example  3.4;  Consider  again,  multiplication  of  matrices  A  and  B  of  example  1  for  a  different  choice 
of  w.  Let  w=<Wj,  \v.„  Wj>  =  <1,1,-1>.  For  this  choice  of  w,  the  set  D  of  diagonals  is  comprised  of 

du  Pll’  ^12’  ^21  )•  Pl2’  Per  ^13’  ^22  )'  ^4={Pl3’  P22’  ^23  ^5=(  P23  }• 

We  use  |D|=5  processors  indexed  from  1  to  5.  The  neighborhood  constants  for  labels  /l,  12  and  13  are 
n(1=l,  n;.,*l  and  n.3=-l.  The  vertices  in  Dj  are  mapped  onto  processor  indexed  i.  The  delays  Tor  the 
labels  11,  12  and  13  are  d{,=l,  d/2=2  and  d;3=l.  The  resulting  mapping  of  the  entire  Cube  Graph  is 
shown  in  Figure  3.15.  In  Figure  3.15,  Ij  and  L,  are  the  input  ports  labelled  11  and  12  respectively  of 
processor  1  and  03  is  the  output  port  labelled  13  of  processor  1.  Similarly  Oj  and  02  are  the  output  ports 
labelled  11  and  12  respectively  of  processor  5  and  I3  is  the  input  port  labelled  13  of  processor  5.  These  are 
the  ports  of  external  communication. 

Constructions  similar  to  those  used  for  Table  3.7  and  Table  3.8  are  used  to  construct  Table  3.9  and 
Table  3.10  respectively.  From  Table  3.10  we  observe  that  it  suffices  to  pump  0  into  Ij  between  tj-7  and 

tj-2  and  also  between  tj+3  and  tj+8. 
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4.  Conclusions 

We  presented  a  formal  model  of  linear  arrays  suitable  for  VLSI,  and  introduced  homogeneous  graphs 
which  are  a  natural  representation  of  programs  potentially  executable  on  such  arrays.  We  then  introduced 
0  graphs  which  are  subsets  of  homogeneous  graphs  and  provided  a  set  of  necessary  and  sufficient 
conditions  on  the  structure  of  graphs  in  0  for  the  existence  of  a  syntactically  correct  mapping  As  a 
practical  consequence  we  developed  a  technique  to  synthesize  linear-array  algorithms  for  programs  in  0 
and  synthesized  a  few  published  algorithms. 

Subsequently,  we  examined  Cube  Graphs  which  are  more  general  than  graphs  in  0  and  showed  a 
technique  to  map  such  graphs  correctly  onto  linear  arrays.  As  a  consequence  we  synthesized  some  novel 
linear-array  matrix  multiplication  algorithms. 

The  technique  to  correctly  map  a  Cube  Graph  can  be  generalized  to  correctly  map  Hypercube  Graphs 
(that  is,  Cube  Graphs  in  Euclidean  K-space  where  K>3)  onto  linear  arrays.  The  details  appear  in  [9j. 
However,  Hypercube  Graphs  are  only  proper  subsets  of  graphs  that  are  not  in  0.  The  structure  of  any 
correctly  mappable  graph  that  is  not  in  0  is  an  open  question. 
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Appendix 


Theorem  3.1  and  Theorem  3.2  characterize  the  structure  of  graphs  in  9.  We  now  develop  the  proofs  for 
these  two  theorems.  !u  addition  we  will  also  show  that  the  algorithm  to  map  a  Cube  Graph  on  a  linear 
array  is  syntactically  correct. 

We  first  establish  certain  fundamental  results  on  major  paths  and  Mesh  Graphs  which  we  will  use  later 
on  in  the  proofs  of  the  two  theorems. 

We  will  continue  to  follow  the  notational  conventions  adopted  in  the  beginning  of  section  3- 
Additionally,  for  any  two  computational  vertices  vx  and  vy,  we  will  be  using  Ap(vx,  vy)  and  AT(vx,  vy)  to 

denote  PA(v  )-PA(vx)  and  TA(vy)-TA(vx)  respectively.  Also  all  the  labels  in  G  will  be  assumed  to  be  in 
Lj  unless  mentioned  otherwise  and  SG  will  denote  a  minimally  labelled  connected  component  of  G. 


A.1  Properties  of  Major  Paths 

A  major  path  specifies  some  transformation  that  a  data  item  undergoes  and  a  correct  mapping  of  a 
program  graph  preserves  the  transformations  of  3ll  the  major  paths  in  the  graph.  The  value  represented 
by  a  major  path  will  be  either  the  input  value  represented  by  the  source  vertex  or  the  output  value 
represented  by  the  sink  vertex  or  an  intermediate  value  represented  by  an  edge  between  two  pairs  of 
computation  vertices  in  the  major  path.  All  major  paths  in  a  program  graph  are  unique  as  we  have  not 
assumed  any  properties  of  the  function  represented  by  the  computation  vertices  in  the  graph.  So  a  value 
represented  by  a  major  path  is  also  unique.  We  use  uniqueness  to  mean  that  the  value  represented  by  a 
major  path  is  distinguishable  from  the  value  represented  by  any  other  major  path. 

The  processor  model  that  we  have  used  in  the  linear  array  does  not  have  any  branching  ability.  This 
imposes  certain  restrictions  on  major  paths  labelled  Ij  in  mappings  where  the  neighborhood  constant  n t-}  is 

0.  These  restrictions  are  captured  in  the  following  lemma. 

Lemma.  A.  It  Let  lj€L,  and  vx  and  vy  be  any  two  computation  vertices.  In  any  mapping,  if  0^=0 
and  PA(vy)=PA(vx)  (i.e.,  the  neighborhood  constant  of  label  fj  is  0  and  vs  and  vy  are  mapped  on  the 
same  processor)  then  vx  and  vy  must  be  in  the  same  major  path  labelled  fj. 

Proofs  If  n^ssO  then  in  every  processor  a  register  serves  as  the  processor's  I/O  port  labelled  /j  and  a 
value  is  preloaded  into  this  register,  and  so  if  vx  and  vy  are  in  different  major  paths  labelled  fj  then  two 

registers  would  be  needed  —  one  to  hold  the  value  of  the  first  major  path  and  the  second  to  hold  the  value 
of  the  second  major  path.  The  processor  would  then  require  branching  to  choose  one  of  the  two  registers 
whenever  it  is  in  active  phase. 

□ 


In  the  following  lemma  we  relate  the  vertices  and  edges  in  a  path  to  the  processors  and  time  steps  at 
which  they  are  mapped. 

Lemma  A.2:  Let  v_  and  v  be  any  pair  of  vertices  in  G.  Consider  any  path  p  from  v  to  v  For 

any  label  fj  let  kj  and  kj  denote  the  number  of  edges  labelled  fj  in  p  whose  directions  are  consistent  and 
not  consistent  respectively  with  the  directed  path  from  vx  to  vy  through  the  same  sequence  of  vertices  as 
in  p.  Then  in  any  correct  mapping  of  G,  JO  (kj  -  kj)n,j  =»  Ap(vx,vy)  and  JO  (kj  -  kj)d,j  =  AT(vx,vy) 

Proofs  Let  T  (kf  +  kf)  =»  n.  So  n  is  the  path  length.  The  lemma  is  easily  established  by  induction  on 
v/j  J  J 


From  the  above  lemma  the  following  result  on  major  paths  is  immediate. 

Lemma  A.3s  Consider  any  major  path  labelled  fj  and  let  vx  and  vy  be  any  two  vertices  in  this 
major  path.  Then  in  any  correct  mapping  of  G,  Ap(vx,vy)d(j=*AT(vx,vy)n,j. 
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Proof:  Immediate  from  Lemma  A.2. 


□ 


We  next  show  that  if  two  major  paths  have  the  same  set  of  computation  vertices  then  they  must  be 
identical. 


Lemma  A.4:  Let  q,  and  q2  be  two  major  paths.  Let  Vj  and  V,  be  the  sets  of  computation  vertices 

in  qj  and  q„  respectively.  If  Vj=V2  and  there  exists  a  correct  mapping  for  G  then  qt  and  q,  must  be 
identical. 


Proof:  Suppose  q,  and  q,  are  oot  identical.  Then  there  must  exist  two  computation  vertices  vx  and 
vy  in  qj  and  q2  such  that  vx  precedes  vy  in  qt  and  vy  precedes  vx  in  q2.  Now  consider  any  correct  mapping 
of  G.  vx  precedes  vy  in  qj  and  so  AT(vx,vy)>0.  Likewise  vy  precedes  vx  in  q2  and  so  AT(vx>vy)<0  —  a 
contradiction. 

□ 


We  are  now  in  a  position  to  show  that  there  can  be  at  most  one  label  in  Lj  whose  neighborhood 
constant  can  be  0. 

Lemma  A.S:  Let  li  and  lj  be  any  two  labels.  Then  in  any  correct  mapping  of  G,  if  n^n^  then 

n,;€{l,-l}. 

Proof:  li  and  lj  are  in  L{,  so  there  exists  a  major  path  qr  labelled  li  that  is  not  identical  to  any  of  the 
major  paths  labelled  fj.  This  implies  that  there  exists  a  major  path  qs  labelled  lj  and, 

1.  either  the  computation  vertices  in  qs  and  qr  are  the  same, 

2.  or  the  computation  vertices  in  qj  are  a  subset  of  the  computation  vertices  in  qr, 

3.  or  the  computation  vertices  in  qf  are  a  subset  of  the  computation  vertices  in  qs. 

Consider  the  first  case.  By  Lemma  A.4,  qf  and  q,  must  be  identical. 

Next  consider  the  second  case.  qs  passes  through  a  subset  of  the  vertices  in  qf.  Let  vx  and  vy  be  two 

vertices  in  qr  such  that  vx  is  in  this  subset  and  vy  is  not.  Clearly  then,  there  is  a  major  path  qt  labelled  lj 

distinct  from  q3  that  passes  through  vy  as  illustrated  in  Figure  A.l. 


Figure  A-1 


Now  assume  n;i=n;j=0.  So  PA(vx)=PA(vy)  and  qs  and  qf  are  distinct  major  paths  labelled  lj  violating 
Lemma  A.l.  So  n;i=nij^0. 

We  can  similarly  show  that  o^n^^O  in  the  third  case  also. 

□ 

A  correct  mapping  must  ensure  that  no  two  values  appear  simultaneously  at  the  input  port  of  any 
processor.  As  we  see  in  the  next  lemma  this  forces  some  constraint  on  the  structure  of  major  paths. 
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Lemma  A.8:  For  any  label  Ij  and  for  any  pair  of  vertices  vx  and  vy  if  Ap(vx,vy)djj=AT(vx,vy)nfj 

in  any  correct  mapping  of  G  then  there  must  be  a  major  path  labelled  Ij  passing  through  vx  and  v  . 

Proof:  Assume  that  in  a  correct  mapping  there  exists  a  pair  of  vertices  v  and  v  and  a  label  ij  such 
that  Ap(vx.v  )d/j=AT(vx,v  )nfj  and  vx  and  vy  are  in  different  major  paths  labelled  ij.  Let  q,  and  q,  be  ! 

the  two  major  paths  such  that  vx  is  in  q,  and  vy  in  q2.  Using  Lemma  A.3  it  can  be  easily  shown  that  that 
for  any  pair  of  vertices  vu  in  qt  and  vw  in  q2,  if  Ap(vx,vy)d,j=AT(vx,vy)n<j  then  / 

Ap(vu,vw)d;j=ATlvu,vw)n(j.  So  assume  without  loss  of  generality  that  vx  and  vy  are  the  first*1)  \ 

computation  vertices  in  qj  and  q2  respectively.  Now  n;j6{l,-l,0}.  We  will  arrive  at  a  contradiction  for 
each  of  the  three  values  that  n^  assumes. 

Case  1:  n,.=0.  So  Ap(vx,vy)=0  as  dt- >0.  Hence  by  lemma  A.l  there  must  be  a  major  path  labelled  Ij 

passing  through  vx  and  vy  —  a  contradiction. 

Case  2:  n(j=l.  Now  ^p(vx,vy)  must  be  either  0,  positive  or  negative.  Let  PA(vx)=Sj  and  PA(vy)=s„. 

Let  TA(vx)=t,  and  TA(vy)=t,. 

(A) :  Ap(vx,vy)=0.  So  Sj=s2.  Now  n^O  and  so  AT(vx>vy)=0.  Hence  the  input  value  represented 
by  source  of  qt  and  the  input  value  represented  by  source  of  q,  appear  simultaneously  at  the  input  port 
labelled  ij  of  s.  —  a  contradiction. 

(B) :  Ap(vx.v  )  >  0.  So  s2>sr  Now  n«— 1  and  so  A^v^v  )>0  and  hence  t2>tj.  The  input  value 
represented  by  source  of  q2  appears  at  the  input  port  labelled  ij  of  Sj  at  time  ts«(s4-s1)d(!.  This  reduces  to 
t2-AT(vx,vy)n(j  which  is  t2-(t2-tj).  So  the  input  value  represented  by  source  of  qt  and  that  of  q,  appear 
simultaneously  at  the  input  port  of  Sj  at  time  tj  -  a  contradiction. 

(C) :  ^p(vx-vy)  <  0.  So  s,>s2.  Now  n;j=»l  and  so  AT(vx,vy)<0  and  so  t,>t2.  The  input  value 
represented  by  source  of  q,  appears  at  the  input  port  labelled  ij  of  s,  at  time  tj-(Sj-s2)d{..  This  reduces  to 
tj  +  AT(vx,vy)njj  which  is  t1+(t2-t,).  Hence  the  two  input  values  appear  simultaneously  at  the  input  port 
of  s2  at  time  t2  -  a  contradiction. 

Case  3:  n(~*l.  Using  proofs  similar  to  Case  2  we  C3n  show  that  the  input  values  represented  by 

sources  of  q(  and  q2  appear  simultaneously  at  the  input  port  labelled  ij  of  a  processor. 

□ 

We  next  show  that  if  the  neighborhood  constants  of  any  two  labels  in  Lj  are  equal  then  their  delays 
cannot  be  the  same. 


Lemma  A.7:  Let  ii  and  ij  be  any  two  labels.  In  any  correct  mapping  of  G  if  n(i=n,j  then  d^d^. 

Proof:  Now  ii  and  ij  are  in  Lj,  so  there  exists  a  major  path  qr  labelled  ii  that  is  not  identical  to  any  of 

the  major  paths  labelled  Ij.  This  implies  that  there  exists  a  major  path  q5  labelled  Ij  and, 

1.  either  the  computation  vertices  in  q,  and  qr  are  the  same, 

2.  or  the  computation  vertices  in  qx  are  a  subset  of  the  computation  vertices  in  qr, 

3.  or  the  computation  vertices  in  qr  are  a  subset  of  the  computation  vertices  in  qs. 

Consider  the  first  case.  By  Lemma  A. 4,  qr  and  qs  must  be  identical  -  a  contradiction. 

Next  consider  the  second  case.  q5  passes  through  a  subset  of  the  vertices  in  qr.  Let  vx  and  vy  be  the  two 
vertices  in  qr  such  that  vx  is  in  the  subset  and  vy  is  not.  Then  there  is  a  major  path  q,  labelled  Ij  distinct  H 

from  q$  that  passes  through  vy  as  illutrated  in  Figure  A. 2. 


'  the  vertex  adjacent  to  a  source  vertex  in  a  major  path 
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Figure  A-2 


By  Lemma  A.3,  Ap(vx)vy)df  ~AT(vx,vy)nK.  Assume  dH=d;j  and  hence  Ap(vx,vy)d,j=AT(vx,vy)nfj.  By 
Lemma  A.6,  vx  and  vy  must  be  in  the  same  major  path  labelled  tj  -  a  contradiction. 

We  can  similarly  show  that  nfi— n^d^d^  for  the  third  case  also. 

□ 


A.2  Connected  Components 

We  now  examine  the  relationship  between  correct  mapping  and  connected  components.  In  particular  let 

Lj  and  ft'CLj.  Let  S  be  a  connected  component  obtained  by  removing  all  the  edges  and  source  and  sink 
vertices  from  G  whose  labels  are  in  L G~{ln,lv}.  In  general  several  such  components  may  result  and  S  is 
one  such  component. 

Let  S^={major  paths  labelled  In  in  S}  and  S|„  «» {major  paths  labelled  lu  in  S}. 

Let  Gff=<VP,E{f>  and  GJ'=*<VJ',EJ'>  be  two  directed  graphs  and  F*  and  Fx  be  two  one-one  functions 
such  that 

1.  — >Vf  (the  major  paths  in  Sl/t  are  represented  by  the  vertices  in  V*1) 

2.  E',a*{<qm,qn>|  qm€St<1,  qn€Sf/|  and  there  exists  a  directed  edge  labelled  lu  from  some 
computation  vertex  in  qm  to  some  computation  vertex  in  qn} 

3.  F^:S iv—>^vs  (the  major  paths  in  S(ft  are  represented  by  the  vertices  in  VJf) 

4.  E^*={<qm,qa>|  qm€Sf„,  qn€S,„  and  there  exists  a  directed  edge  labelled  /p  from  some 
computation  vertex  in  qm  to  some  computation  vertex  in  qn} 

We  are  now  in  a  position  to  establish  the  first  fundamental  result  concerning  the  structure  imposed  on  S 
by  any  correct  mapping. 

Lemma  A.8:  If  there  exists  a  syntactically  correct  mapping  for  G  then  S  must  satisfy  the  following 

conditions. 

1.  G{*  must  be  acyclic,  and  there  must  be  a  unique  directed  path  between  any  pair  of  vertices  in 

vf. 

2.  G?  must  be  acyclic,  and  there  must  be  a  unique  directed  p3th  between  anv  pair  of  vertices  in 

Vj- 

Proof:  The  proofs  for  (1)  and  (2)  are  similar  and  we  thus  only  prove  (I). 

We  will  first  show  that  G{*  is  acyclic.  Suppose  there  is  a  cycle  in  G£.  Let  qJt  q„  ...  qm  be  the  set  of 
vertices  in  V(*  that  form  a  cycle  in  Gj*  as  shown  in  Figure  A.3. 


Fi*ur#  A-3 


This  cycle  implies  that,  between  any  pair  vx  and  vy  of  not  necessarily  distinct  computation  vertices  in  q,, 
there  exists  a  path  p  between  them  through  computation  vertices  in  each  of  q„,  q3,  qm  and  through 

edges  labelled  Ip  or  If  as  shown  in  Figure  A.4  wherein  the  'horizontal  edges*  are  labelled  Ip  and  the 
"non-horizontal  edges'  are  labelled  If. 


Figur*  A-4 


Let  k*  and  k~  be  the  number  of  edges  labelled  Ip  in  p  whose  directions  are  consistent  and  not  consistent 
respectively  with  the  direction  imposed  on  them  by  the  directed  path  passing  through  the  same  sequence 
of  vertices  as  in  p.  Let  k*-k"=h. 

Similarly  let  k*  and  k*  be  the  number  of  edges  labelled  If  in  p  whose  directions  are  consistent  and  not 
consistent  respectively  with  the  direction  imposed  on  them  by  the  directed  path  passing  through  the  same 
sequence  of  vertices  as  in  p.  As  m  vertices  in  G*  form  a  cycle,  k‘-k"=*m  and  clearly  m>l.  By  Lemma 
A. 2, 

VvV=TV+nh'n 

and  A,(7x.7y)  =d,„h+df<,a 

Let  the  distance  between  vx  and  vy  in  the  major  path  qj  be  k.  Hence 
Ap(7x,7y)=n((4lc 

and  AT(71,7y)=d^lc 
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and  so 

a«(.l£Sni,.h+n/*B  ...  (a) 


<V=V+V  •  (b) 

By  Lemma  A. 5,  n;<4=»n^  =*  nltl**nlt,^0  and  hence  the  possible  values  that  <nt/l,  atl/>  can  assume  are 

<1,0>,  <1,1>,  <1.-1>.  <-1.0>,  <-l,l>,  <-!,-!>,  <0,1>  and  <0,-l>.  We  will  arrive  at  a 
contradiction  for  each  of  these  values  that  <altl,n{l/>  asuumes. 

1.  Consider  the  set  of  values  <1,1>  and  <-l,-l>,  that  is,  From  (a)  and  (b)  d^=d;j/ 

-  3  contradiction  since  by  Lemma  A.7,  d^f^d^. 

2.  Consider  the  set  of  values  <0,1>  and  <0.-l>,  that  is,  a^—0  and  n/„€{l,-l}  From  (a) 
nfl/=*0  —  a  contradiction  as  n{t,^0. 

3.  Consider  the  set  of  values  <1,0>  and  <-l,0>,  that  is,  n.  €{1,-1}  and  n{t,*=0.  From  (a)  and 
(b)  d(j,=0  —  3  contradiction  as  dIt(>0. 

4.  Consider  the  set  of  values  <-1,1  >  and  <1,-1  > ,  that  is,  and  “^€{1,-1}.  From  (a) 

and  (b)  dl/t=-dlt/  —  a  contradiction  as  dt/l>0  and  dlv>0. 


So  we  have  arrived  at  contradictions  when  G**  has  a  cycle  and  hence  G{*  must  be  acyclic. 


We  next  show  that  there  must  be  a  directed  path  between  any  pair  of  vertices  in  V£.  Suppose  not.  Then 

let  q5  and  qt  be  two  vertices  in  V*  that  do  not  have  a  directed  path  between  them.  Now  G*1  is  connected 

and  so  there  must  be  a  qk  in  V£*  such  that  one  of  the  following  two  cases  must  occur. 

1.  There  are  two  vertex  disjoint  directed  paths;  one  from  qs  to  qk  and  the  other  from  qt  to  qk.  ~ 

2.  There  are  two  vertex  disjoint  directed  paths;  one  from  qk  to  qs  and  the  other  from  qk  to  qt. 


We  will  only  consider  the  first  case  and  the  proof  for  the  second  case  will  be  similar.  Let  qm  be  the 
vertex  adjacent  to  qk  in  the  directed  path  from  qf  to  qk  and  qn  be  the  vertex  adjacent  to  qk  in  the 
directed  path  from  qt  to  qk  as  shown  in  Figure  A.5. 


Fl«ue*  A. 3 

Now  qm,  qQ,  qk,  qs  and  qt  are  all  major  paths  labelled  In  in  G.  Existence  of  a  directed  edge  from  qm  to  qk 
in  G*  in  Figure  A. 5  implies  there  exists  computation  vertices  vx  in  qm  and  vw  in  qk  and  a  directed  edge  ea 
labelled  h>  from  vx  to  vw.  Similarly  existence  of  a  directed  edge  from  qn  to  qk  in  G*  implies  there  exists 
computation  vertices  vy  in  qn  and  vu  in  qk  and  a  directed  edge  eb  labelled  lv  from  vy  to  vu  as  illustrated 
in  Figure  A.6. 
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Figure  A-6 


In  Figure  A.6  each  of  the  shaded  boxes  denote  a  major  path  labelled  l/i.  Let  the  distance  between  vw  and 
vu  in  q^  be  h  and  hence  in  any  correct  mapping, 

VvTt)*v» 

and  AT(7,iI,Ttt)=d{Mh 
As  there  is  a  directed  edge  from  vx  to  vw, 

VT«*  *«>*“»* 

and  ^(Tj.v^sd,,, 

Also  as  there  is  a  directed  edge  from  vy  to  vu, 

Ap(7y.Ta)=n,„ 

and  AT(7y.7tt)=dI„ 

From  the  above  equations  we  obtain, 

Ap(7x.yy)=n^h 

and  AT(7Jl.77)=dIjlh 

Now  by  Lemma  A.6,  vT  and  vy  must  be  in  the  same  major  path  labelled  l/i.  But  qm  and  qn  are  distinct 
-  a  contradiction. 

Lastly  we  must  show  that  the  directed  path  between  3ny  pair  of  vertices  in  G£  is  unique.  Suppose  not. 
Let  qm  and  q0  be  two  vertices  in  G*1  such  that  there  are  two  distinct  directed  paths  from  qm  to  qn.  Let 
and  fam'Vt’-^mi-^'V}  be  the  two  sequence  of  vertices  traversed  by  the  first 
and  second  directed  paths  respectively.  Let  qt  and  qr  be  distinct.  So  the  two  sequences  differ  after  qs.  We 
have  already  shown  that  there  must  be  a  directed  path  between  any  pair  of  vertices  in  GJ*.  Without  loss  of 
generality  let  there  be  a  directed  path  from  qr  to  qv  So  now  there  are  two  directed  paths  from  q}  to  qt. 
The  first  directed  path  is  a  directed  edge  from  qs  to  qt  and  the  second  directed  path  is  through  the 


sequence  of  vertices  {qr,qri,  -,qrj}-  These  two  directed  paths  imply  the  existence  of  computa'.on  vertices 
vx  and  vy  in  qs  and  qt  respectively  and  two  paths  pt  and  p2  between  them  as  shown  in  Figure  A  7 


Figur®  A-7 


The  first  path  Pj  between  vx  and  vy  traverses  the  edge  e4  labelled  Iv.  The  second  path  p2  is  through 
computation  vertices  in  qr.qrl.-  qrj-  Let  and  k|  be  the  number  of  edges  labelled  fix  in  px  whose 
directions  are  consistent  and  not  consistent  respectively  with  the  direction  imposed  on  them  by  the 
directed  path  from  vx  to  vy  passing  through  the  same  sequence  of  vertices  as  in  pj.  Let  kf  -  k^sshj.  For 

this  path  from  Lemma  A.2,  we  obtain, 

Ap(vx.vy)=hln^+n^ 

AT<vx'vy)*hldip+d,1/ 

Let  k£  and  kj  be  the  number  of  edges  labelled  fp  in  p2  whose  directions  are  consistent  and  not  consistent 
respectively  with  the  direction  imposed  on  them  by  the  the  directed  path  from  vx  to  vy  passing  through 
the  same  sequence  of  vertices  as  p,.  Let  k£  -  kj=h2.  Also  let  kj  and  k2  be  the  number  of  edges  labelled 
Iv  in  p,  whose  directions  are  consistent  and  not  consistent  respectively  with  the  direction  imposed  on  them 
by  the  directed  path  from  vs  to  vy  passing  through  the  same  sequence  of  vertices  as  in  p2.  Let 
kj1  -  k2  =  m.  The  distance  from  qr  to  qt  must  be  at  least  1  and  so  m>l.  For  the  second  path  p„.  from 
Lemma  A.2  again,  we  obtain, 

Ap(vx-Vy)~h2n,„+n{t,m 

AT(vx'Vy)==Mi<i+dZvm 

and  so 

(h,-h„)nlM=(m-l)nt„  (c) 

(h1-h2)d^=(m-l)df|,  (d) 

(c)  and  (d)  are  similar  to  (a)  and  (b)  that  resulted  from  a  cycle  in  G**.  Hence  solution  to  (c)  and  (d} 
would  lead  to  contradictions  and  hence  the  directed  path  between  any  pair  of  vertices  in  G**  must  be 
unique. 

□ 

We  establish  the  link  between  Mesh  Graphs  and  S  through  the  following  lemma. 

Lemma  A.9:  S  is  a  Mesh  Graph  if  and  only  if  the  following  conditions  are  satisfied: 

1.  GJ*  is  acyclic,  and  there  must  exist  a  unique  directed  path  between  any  pair  of  vertices  in  VJ. 


2.  G*  is  acyclic,  and  there  must  exist  a  unique  directed  path  between  any  pair  of  vertices  in  VJ'. 

Proofs 

(Onlv  If):  Simple. 

(If  Part):  Let  Vg  be  the  set  of  computation  vertices  in  S.  Topologically  sort  the  vertices  of  Gf  and  G^. 
Assign  indices  ranging  from  0  to  JV^j-1  to  the  topologically  sorted  vertices  in  V*.  Let  Ij  denote  the 
sequence  of  indices  ranging  from  0  to  |V**|-1.  Similarly  assign  indices  ranging  from  0  to  |V'^|-1  to  the 
topologically  sorted  vertices  in  VJ'.  Let  U  denote  the  sequence  of  indices  ranging  from  0  to  |VJ’|-1. 

We  next  construct  a  set  £CIjXl2  and  a  one-one  function  F:VQ — >B.  To  begin  with  let  Let  qm 

and  qn  be  any  two  vertices  in  V£  and  Vj'  respectively.  Now  qm  and  qn  are  major  paths  labelled  (p  and 
respectively  in  S.  Let  vx  be  the  computation  vertex  in  qm  and  qn.  Let  a  and  6  be  the  indices  assigned  to 
qm  and  qQ  respectively  by  the  topological  sort  of  vertices  in  V*1  and  respectively.  Then  let 

F(vx)«<6,a>  and  B=flu{<6,a>}.  Using  conditions  (1)  and  (2)  of  the  lemma  it  can  be  easily  shown 
that  F  is  a  one-one  function  that  transforms  S  into  a  Mesh  Graph. 

□ 

We  are  now  in  a  position  to  establish  our  fundamental  result  relating  S,  Mesh  Graphs  and  correct 
mapping. 

Theorem  A.1:  If  there  exists  a  syntactically  correct  mapping  for  G  then  S  must  be  a  Mesh  Graph. 

Proof:  Straightforward  from  Lemma  A.8  and  Lemma  A.9. 

□ 

Theorem  3.1  captured  the  structure  of  the  minimally  labelled  component  SG  of  G£@  and  its  proof  is  an 
immediate  consequence  of  Theorem  A.l. 


A.3  Properties  of  Mesh  Graphs 

We  examine  some  properties  of  Mesh  Graphs  that  we  will  be  using  later  on.  For  purposes  of  examining 
these  properties  alone  we  will  assume  that  the  connected  component  S  is  a  Mesh  Graph. 

Let  v  and  v  be  any  two  computation  vertices  in  S.  Consider  any  path  p  between  v  and  v  Let  k?  and 

j%  y  x  y  i 

kif  be  the  number  of  edges  labelled  /p  in  p  whose  directions  are  consistent  and  not  consistent  respectively 
with  the  direction  induced  on  them  by  the  directed  path  from  v  to  v  through  the  same  sequence  of 

vertices  as  in  p.  Similarly  let  kj  and  k2  be  the  number  of  edges  labelled  Iv  in  p  whose  directions  are 
consistent  and  not  consistent  respectively  with  the  direction  induced  on  them  by  the  directed  path  from  vx 
to  vy  through  the  same  sequence  of  vertices  as  p.  In  the  following  lemma  we  relate  <xI<|,xIj,>  and 
<yill,yit,>  to  kj*  .  k£  .  kj  ,  and  k£. 

Lemma  A.10:  kf  -  k£  =  y^-x^  and  k£  -  k£  =»  y lv-x(v. 

Proof:  The  proof  is  by  induction  on  the  path  length.  Let  n  denote  the  path  length,  v  and  v  are 

*  y 

distinct  and  hence  n>0. 


Basis  Step:  n**l;  so  the  path  consists  of  only  one  edge.  Hence  only  one  of  k£  .  k.?  .  kj 

and  k2  can  be  1  and  the  rest  must  be  0. 

We  will  show  for  the  case  kj*  *  1.  So  k|  =  kj  =  k2  =*  0.  This  implies  the  path  is  a  directed 
edge  labelled  in  from  vx  to  vy.  By  definition  of  a  Mesh  Graph  then  and  yj„-Xj„* 0.  Similarly  we 

can  prove  the  basis  is  true  for  the  other  three  cases  also. 


Induction  Step:  Assume  the  lemma  is  true  for  paths  of  length  <n.  Consider  any  path  fron  vx  to  v 

of  length  n+1.  If  n+l  =  l  then  lemma  holds  by  basis.  So  assume  n+l>l  and  let  \2  be  any  intermediate 
vertex  in  this  path.  Let  nj  and  n2  denote  the  path  length  from  vx  to  v?  and  vz  to  vy  in  this  path.  Clearly 
n^n  aud  n2<n.  By  applying  the  induction  hypothesis  to  each  of  these  two  paths  it  follows  that  the 
lemma  is  true  for  paths  of  length<n+I. 

□ 

Now  consider  any  correct  mapping  of  S  and  let  v  and  v  be  any  two  computation  vertices  in  S  We 
relate  the  processors  and  the  times  at  which  they  are  mapped  in  the  following  lemma. 

Lemma  A.11:  ^P(vx.vy)=(y,„*xiM}n^+(y,t,-x{i,)n^  and  AT(vx,Vy)=(y^..x^)d,<4+(y/i/-x^)d^ 

Proof:  Straightforward  from  Lemma  A.10  and  Lemma  A.2 

□ 

We  next  establish  a  fundamental  property  of  Mesh  Graphs.  This  property  relates  the  existence  of  a 
directed  path  between  two  computation  vertices  in  a  Mesh  Graph  to  certain  relationships  between  their 
coordinates.  This  is  useful  in  the  proof  of  Theorem  3.2  wherein  we  show  that  certain  graphs  in  0  can 
never  be  mapped  correctly. 

To  prove  this  property  the  following  lemma  is  useful. 

Lemma  A.12:  Let  and  let  qm,  qa  and  qk  be  three  distinct  major  paths  labelled  li.  If  the 

indices  m,n  and  k  of  qm,qn  and  qk  respectively  are  such  that  m<k<n,  then  any  path  between  any 
computation  vertex  in  qm  and  any  computation  vertex  in  qn  must  pass  through  a  computation  vertex  in 

V 

Proof:  Let  li=ln  and  let  vx,vy  and  vz  be  any  three  computation  vertices  in  qm,qn  and  qk  respectively. 
Indices  of  qm,qn  and  qk  are  m,n  and  k  respectively  and  hence  Fit,(vx)=*m,  Fit,(vy)=n  and  F/t,(vJ=k. 

Now  assume  that  the  path  does  not  pass  through  any  computation  vertex  in  qk.  Then  the  path  must 
traverse  an  edge  labelled  lv  between  two  computation  vertices  in  major  paths  qs  and  qr  that  are  labelled 
Ip  such  that  if  s  3nd  r  arc  the  indices  of  qs  and  qr  respectively  then  s<k<r.  By  Lemma  A.10,  the  number 
of  edges  labelled  Iv  in  any  path  from  q3  to  qr  is  r-s.  Since  s<k<r  and  k,r  and  s  are  integers,  r-s>2.  But  as 
there  is  also  an  edge  labelled  h>  between  a  computation  vertex  in  qs  and  a  computation  vertex  in  qr  it 
follows  from  the  definition  of  a  Mesh  Graph  that  r-s=*l-a  contradiction. 

Using  similar  arguments  we  can  show  that  the  lemma  is  true  for  li—lv. 

□ 

The  following  result  is  a  straightforward  consequence  of  the  previous  lemma. 

Corollary  A.1:  Let  and  and  let  fi^/j.  Let  qm  and  qn  be  two  distinct  major 

paths  labelled  li.  If  their  indices  m  and  n  differ  by  1  then  any  path  between  a  computation  vertex  in  qm 
and  a  computation  vertex  in  qa  must  traverse  an  edge  labelled  fj  between  computation  vertices  in  qm  and 
qn  respectively. 

Proof:  Without  loss  of  generality  let  m=5n+l,  where  m  and  n  are  the  indices  of  qm  and  qa 
respectively.  Now  pick  a  path  from  some  computation  vertex  in  qm,  say  v  ,  to  some  computation  vertex 
in  qn,  say  v  such  that  it  does  not  traverse  an  edge  between  any  pair  of  computation  vertices  in  q^  ind 
qn.  Then  there  must  be  a  computation  vertex  in  this  path  distinct  from  vx  and  vy.  Let  vz  be  in  the 
major  path  q,.  Let  s  be  the  index  of  q$.  If  s>m  then  the  path  from  v  to  vz  violates  Lemma  A.12  and  if 
s<m  then  the  path  from  vz  to  vy  violates  Lemma  A.12. 


□ 


We  are  now  ready  to  establish  a  fundamental  property  of  Mesh  Graphs. 


Lemma  A.13:  Let  vx  and  vy  be  any  pair  of  computation  vertices  such  that  y {  >xf  and 

Then  there  must  exist  a  directed  path  from  vx  to  vy. 

Proof:  Let  y(<|-x(<J=m  and  y/„*X{„=n.  The  proof  is  an  induction  on  n. 


Basis  Ster 


We  need  to  consider  the  case  when  m=0  and  n>0  and  the  case  when  m>0  and  n=0. 


Case  1:  m=0  and  n>0.  By  the  definition  of  a  Mesh  Graph,  there  must  be  a  directed  path  from  v  to 


vy  in  some  major  path  labelled  Iv. 


Case  2:  m>0  and  n=0.  By  the  definition  of  a  Mesh  Graph  again,  there  must  be  a  directed  path  from 


vx  t0  vy  *n  some  major  path  labelled  Ifi. 


Induction  Step:  Assume  the  lemma  holds  for  3ny  pair  of  vertices  v  and  v  such  that  0<y,  -x.  <m 


and  0<y,  *x,  <n.  We  will  show  that  it  holds  for  any  v  and  v  such  that  0<y,  -x,  <m+l  and 


0<yj  -Xj  <n+l.  To  do  this  we  have  to  consider  the  following  three  cases. 

L  y and  y lSxtv^a- 

2-  yiM*x(;i<m  and  y,„-x(„-n+l. 

3-  ytfXt0—m+l  and  y^-x^-n+l. 

The  following  geometric  picture  comes  in  useful  in  understanding  the  proof. 


Figure  A-8 


The  lines  GH,  IJ  and  KL  denote  major  paths  labelled  lu.  The  index  of  GH  is  x,  and  the  indices  of  IJ  and 
KL  are  m+x,M  and  m+l+x(;4  respectively.  The  lines  AB,  CD  and  EF  denote  major  paths  labelled  In-  The 
index  of  AB  is  x,^  and  the  indices  of  CD  and  EF  are  n+x^  and  n+l+x^.  The  induction  hypothesis  holds 
for  vx  and  any  vy  within  the  region  enclosed  by  AB,  CD,  GH  and  IJ  vhich  is  the  shaded  region  in  the 
above  figure  . 

We  first  proceed  to  establish  that  the  lemma  bolds  for  any  vy  such  that  y^-x^=m+l  and  0<y^  -x(  <n 
Consider  one  such  vertex  vy  as  shown  in  Figure  A.9. 
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Figure  A-9 

From  Corollary  A.l,  any  path  from  vx  to  vy  must  traverse  an  edge  labelled  In  between  vertices  in  IJ  and 
KL.  Let  vy  and  vw  be  the  two  vertices  in  IJ  and  KL  respectively.  Now  vu  and  vw  must  appear  in  one  of 
the  following  three  regions  in  Figure  A. 9. 


1.  Above  AB 

2.  Within  AB  and  MN 

3.  Below  MN 


Figures  A.10(a),  A.10(b)  and  A.10(c)  illustrate  cases  1,  2  and  3  respectively. 
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Figure  AdO(a) 


Figure  A-lO(b) 


Figure  A-lO(c) 
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Case  1:  By  the  definition  of  a  Mesh  Graph,  v2  must  exist  in  AB.  Then  there  is  3  directed  path  from  vx 
to  v,  and  from  v  to  v  . 

Case  2:  By  the  inductive  assumption,  there  is  a  directed  path  from  vx  to  v  The  edge  labelled  /p  is 
directed  from  vu  to  vw.  By  the  Mesh  Graph  definition  there  is  a  directed  path  from  vw  to  v 

Case  3:  We  now  show  that  whenever  vu  and  vw  occurs  below  MN  then  v2  must  always  exist.  Suppose 

not.  Then  by  definition  of  Mesh  Graph  there  cannot  be  any  vertex  on  IJ  above  MN  and  on  MN  to  the  left 
of  IJ.  Consider  any  path  pt  between  vx  and  any  vertex,  say  vs  on  IJ.  va  must  be  below  MN.  Then  by 

Lemm3  A.12,  there  must  exist  a  vertex,  say  vr  on  MN  in  the  path  p j  and  vr  precedes  v_  in  pp  As  vz  does 
not  exist,  vr  must  be  to  the  right  of  IJ.  Consider  any  path  p2  between  vx  and  vf.  By  Lemma  A.12  again, 
there  must  exist  a  vertex,  say  vt  on  IJ  in  the  path  p2  and  v,  precedes  vr  in  pr  So  there  exists  a  path 
between  vx  and  vt  and  no  vertex  on  MN  in  this  path  —  a  contradiction.  So  v2  must  exist.  Therefore  by 
the  inductive  assumption  there  is  a  directed  path  from  vx  to  vr  The  edge  labelled  /p  is  directed  from  vz  to 

V 

We  can  similarly  show  that  the  lemma  holds  Vvx  and  Vvy  such  that  y^-x(  <m  and  yJl<-xi|/=n-)-l  and 
also  holds  Vvx  and  Vvy  such  that  y^-Xj^m+1  and  y(|/-xtl,=n+l. 

□ 

We  have  established  all  the  relevant  results  to  prove  Theorem  3.2. 

Proof:  (of  Theorem  3.2) 

(Onlv  If  Part):  Consider  a  correct  mapping  of  G.  Now  by  Theorem  3.1,  SG  must  be  a  Mesh  Graph. 
We  construct  a  Main  Diagonalization  of  SG  as  follows.  If  <nift,n<J/>6{<l,l>,<l,-l>,<l,0>,<0,l>} 

then  let  w=<w1,w2>«=<ni<|,ni|/>  and  n^  be  the  consistency  constant  of  a[}.  Otherwise  let 
w=<-n,#|,-n/l/>  and  -n^  be  the  consistency  constant  of  a ty 

We  will  prove  that  each  of  the  three  conditions  is  necessary  when  <alft,nll/>=<l,-l>  as  the  proof  for 
any  other  value  that  it  assumes  is  similar. 

<nJ<|,nJ|,>*»<l.-l>  and  so  by  the  above  construction  of  a  Main  Diagonalization  the  diagonalization 
factor  w*s<l,-l>.  Hence  the  complementary  diagonalization  factor  wc=s<0,l>. 

(1)  Consider  any  edge  labelled  fj  directed  from  vx  to  vy.  Now  Ap(vx,vy)=otj€{l,-1.0}.  Also  by 
Lemma  A.ll,  Ap(vx,vy)=s(yi<4-x^)  -  (y/t,-x{„)  and  so  Ap(vx,vy)*»AD(vx,vy)  and  hence  a:j  is  consistent  with 
respect  to  T0. 

So  consistency  of  a(j  with  respect  to  2"D  ensures  that  adjacent  vertices  are  mapped  on  neighboring 
processors. 

(2)  Now  Ap(vx,vy)~(y^-x/#|)-(y/t,-x/4<) 

-n(j 

Also  AT(vx,vy)  — (y^-x^Jdj,,  + 
and  AT(vx,vy)  ~dfj 

As  n,j,d,j.d^  and  ilu  are  all  constants,  (y{„-x,J  is  a  constant.  wc*=<0,l>  and  so  ADc(vx,vy)=*(y(l,-x/t,) 
and  hence  is  consistent  with  respect  to  rDc- 

Consistency  of  b^  with  respect  to  T$e  ensures  that  elements  in  a  data  stream  travel  at  a  constant 
velocity. 

(3)  Let  o—fr^-x,  )  and  6*(y ;„*x{j,).  We  have  already  proved  that  a(j  and  by  are  consistent  with 
respect  to  TD  and  fDe  respectively  and  hence  we  easily  obtain  ^“(m^+c^d^+c^d^. 

From  Lemma  A.ll,  Ap(vx,vy)»a-4»AD(vx,vy)  and  AT(vx.vy)»d^a+dj1/4. 

Now  we—  <0,1  >  and  so  ADe(vx,vy)«b.  Also  c/jAD(vx,vy)«m(jADc(vx,vy)  and  so  c^a-fe)— m,^. 
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Now  Ap(vx)vy)djj=(a-b)d,j 

=[(m/j+cij)d/<,+c<jd^(a'6) 

-m<.AT(vx.vyHd(#i+«l|l,)h«c<j+M«n(j+c<j)l 

“m^^v^Vy) 

and  so  from  Lemma  A.6,  there  must  be  a  major  path  labelled  Zj  passing  through  vx  and  vy. 

Satisfaction  of  this  condition  ensures  that  no  two  values  appear  simultaneously  at  the  input  port  of  any 
processor. 

{If  Part):  Let  D*{Dl(  D„  Da}  be  the  set  of  main  diagonals  where  i  denotes  the  index  of  any  D;€D. 
Construct  a  linear  array  LAr  with  |iV|=o.  Now  construct  a  mapping  through  the  following  steps. 

1.  Choose  two-phase  clocking  if  there  exists  a  transitive  edge  labelled  Zj  such  that  m^— 0  or  else 
choose  a  single-phase  clocking  scheme. 

2.  Let  D  be  any  diagonal  in  D  and  let  v  be  any  computation  vertex  in  Dn.  Then,  let  PA(v  )=q. 

This  assigns  computation  vertices  to  processors. 

3.  Next  fix  the  neighborhood  constant  n^  and  delay  constant  d{.  for  every  label  Zj  in  Lr  Let 
ntj=my.  Let  d4  and  db  be  two  constants  which  we  will  be  using  in  the  construction  of  the 
delays  for  the  labels  in  Lr  If  the  main  diagonalization  factor  w  is  <1,-1>  or  there  exists  a 
transitive  edge  labelled  Zj  such  that  my=“0  then  let  dx=*2  else  let  dx=l.  Let  cmin  be  the 
minimum  of  all  consistency  constants  among  all  the  relations  in  SDe.  If  cmiQ>0  then  set  db=l 
else  set  db=l+|cminld1.  Let 

4.  Next  construct  the  neighborhood  and  delay  constant  for  the  labels  in  L2.  By  definition  of  L0, 
if  there  exists  a  label  Zj  in  L2  then  there  must  exist  some  label  Zi  in  Lj  such  that  for  every 
major  path  in  E,j  there  is  an  identical  major  path  in  EH.  Hence  let  nfj=nfi  and  d(j=di;. 

5.  For  every  Zj  in  L3,  let  the  neighborhood  relation  imposed  by  label  Zj  on  processors  in  N  be 
empty  and  hence  no  processor's  output  port  labelled  Zj  is  connected  to  the  input  port  labelled 
Zj  of  any  processor. 

6.  Construct  the  function  TA  which  assigns  computation  vertices  to  time  steps.  Let  vs  be  the 
computation  vertex  which  is  in  Dj6D  and  DCjEDc.  Let  TA(vs)=t0.  Let  vx  be  any 
computation  vertex  in  Dp€D  and  DcqgDc.  Then,  let  TA(vx)=t0+(q-l)d4+(p-l)db. 

Step  1  to  step  6  described  above  completes  the  construction  of  a  correct  mapping  which  we  establish  as 
follows. 

We  begin  by  showing  that  for  any  label  Zj,  n^  and  dy  are  constants.  Consider  an  edge  labelled  Zj  from  vt 
to  vy  and  let  vx  be  in  Dp  and  Dcq  and  vy  in  Dr  and  Dcs  respectively. 

Now  AD(vx'vy)~Ap(vx-vy) 

=r-p 

=m(j€{l,-l,0} 

-n,. 

Next  A^v^VyjHs-qjd^r-pjdh 

=Aoc(vx.vy)dl+A0(vx,vy)db 

=mjjdb+c^dx 


Next  we  show  that  for  any  Zj  if  n,j*0,  then  all  the  vertices  mapped  onto  the  same  processor  belong  to 
the  same  major  path  labelled  Zj.  Suppose  ntj«»0.  Then  m(j*0.  Consider  any  vx  and  vy  such  that 
AD(vx.vy)— °-  Then  Ap(vx,Vy)»0  and  so  cijAD(vx,Vy)*m/jADc(vx,Vy).  But  by  condition  (2)  of  the 
Theorem  there  must  be  a  major  path  labelled  Zj  passing  through  vx  and  vy.  So  whenever  0^=0  and 
PA(vx)*PA(Vy),  there  is  always  a  major  path  labelled  Zj  passing  through  vx  and  vy. 


We  next  show  that  no  two  values  appear  simultaneously  at  the  input  port  of  any  processor.  We  have 
shown  that  for  any  label  fj,  if  n;j=0  then  vertices  mapped  onto  the  same  processor  all  belong  to  the  same 

major  path  labelled  Ij  and  hence  no  two  values  of  any  two  distinct  major  paths  labelled  ij  appear 
simultaneously  at  the  input  port  labelled  /j  of  any  processor.  So  we  need  to  consider  only  major  paths 
labelled  ,’j  whose  neighborhood  constant  n;j€{l,-l}.  Let  n,j=  1  and  let  qt  and  q,  be  two  major  paths  whose 

input  or  output  values  appear  simultaneously  at  the  input  port  of  some  processor  in  the  array.  Clearly 
the  input  values  associated  with  these  two  paths  must  be  fed  simultaneously  at  the  external  input  port 
associated  with  label  ij  and  let  t  denote  this  time.  Let  v  and  v  be  the  first  vertices  of  q.  and  q„ 

respectively.  The  time  taken  by  the  input  value  of  qt  to  reach  PA(vx)  is  t+PA(vx)d(j  and  the  time  taken 
by  the  input  value  of  q2  to  reach  PA(vy)  is  t+PA(vy)dy.  Without  loss  of  generality  let  PA(v  )>PA(vx) 
and  hence, 

AT<vx’Vy)  —  Ap(v.x-vy)dij  and  so 

AT<vx-vyH  =Ap(vx'Vy)dZj 
Now  m;j=njj  and  d^ssm^d^+c^dj  and  so 

(ADc(Vx’Vy)dx+AD(Vx-Vy)db)m.'j=AD<Vx-vy)Kdb+cZjdJ  and  hence  ADc(vx>vy)mZj=AD(vx-vy)cZj-  But  b* 
condition  (2)  of  the  Theorem,  qt  and  q2  must  be  the  same  major  path  labelled  /j.  We  can  arrive  at  a 

similar  contradiction  when  n2j=>l. 

Lastly  we  show  that  dfj>0  for  any  label  ij.  Consider  the  case  when  w=<l,-l>.  So  wc— <0,1>.  By 
construction  dx>0  and  db>0.  Now  djj=nijjdl)+2c(j.  We  will  show  that  VIj,  ^>0.  Let  vx  and  vy  be 
vertices  such  that  there  is  an  edge  labelled  ij  from  vx  to  vy.  So  AD(vx,vy)=(y/#|-x/#1)-(y/1/*x^)— m/j- 
wc=<0,l>  and  hence  ADc(vx,vy)=yij,-x,1/. 

Suppose  Cjj<0.  Then  m{j€{l,-l,0}  and  hence  y/(|-x/(1<0  and  so  by  Lemma  A.13,  there  must  be 

a  directed  path  from  vy  to  vx  causing  a  cycle.  So  Vfj,  cfj>0.  Hence  db=»l  and  dij=mij+2c,j. 

If  c;j=0  then  we  will  show  that  mfj>0.  Suppose  m^<0.  Then  AD(vx,vy)<0.  c,j=0  and  hence  y(j/=x,l, 
and  hence  y^*x^<0.  So  by  Lemma  A.I3,  there  must  be  a  directed  path  from  vy  to  vx  causing  a  cycle.  So 
m,j>0  and  hence  d,j>0.  mjj€{l,-l}  and  hence  if  c^>0  then  d^>l. 

For  the  cases  w=<0,l>  and  w=<l,0>  we  can  show  by  Lemma  A.13  that  (a)  if  ^<0  then  m/j>0 
and  (b)  if  m(j<0  and  cm;n<0  then  Cjj>l+|cm|n|.  For  both  these  cases  we  can  easily  show  that  d,j>0. 

□ 


A.4  Correctness  of  Mapping  Cube  Graphs 

We  had  provided  a  technique  for  mapping  Cube  Graphs  onto  linear  arrays.  Herein  we  establish  that  the 
mapping  is  syntactically  correct.  We  begin  by  first  showing  that  the  mapping  preserves  the  neighborhood 
constant  of  the  labels. 

Theorem  A.2s  Let  f€LG  and  let  n(  and  dl  be  its  neighborhood  and  delay  constants  respectively. 
Then,  if  e=<vx,vy>  is  the  directed  edge  from  vx  to  vy  and  its  label  is  l  then  PA(vy)=PA(vx)+n/. 

Proofs  Let  vx  and  vy  be  the  vertices  in  diagonals  Dp  3nd  Dq  respectively  and  wp  and  wq  be  the 
weights  of  Dp  and  Dq  respectively.  So, 

vtxll+v2xZ2+v3xZ3=wp 

and  •-1yil+v2yf2+v3yf3=wq 

Let  l=s/l.  Since  e»<vx,vy>  and  label  of  e  is  1 1  it  follows  from  definition  of  Cube  Graph  that 
yfl=xil -h  1 .  y(2=x(,  and  y<3™xtt.  Consequently,  wq-wp*Wjsal.  Now  p  and  q  are  the  indices  of  Dp  and  Dq 
respectively.  We  next  show  that  q*p+l.  Suppose  qy^p+l.  Let  Df  be  a  diagonal  distinct  from  Dp  and  Dq 
such  that  wp  <  wr  <  wq.  Since  wp,  wf  and  wq  are  integers,  it  follows  th3t  wr-wp>i  and  wq-wr>l  and 
hence  wq-wp>2.  But  wq-wp»WjS»l  . a  contradiction.  So  q»p+la*p+Wj. 


The  mapping  algorithm  maps  vertices  in  Dp  onto  processor  p  and  those  of  Dq  onto  processor  p+wt  and 
hence  PA(v  )*PA(vx)+Wj.  Also  from  the  mapping  algorithm  na=wr  So  the  theorem  holds  for  /=/l. 
Similarly  we  can  show  that  the  theorem  also  holds  when  b**l2  and  1=*I3. 

□ 

We  next  show  that  the  mapping  preserves  the  delay  constant  of  every  label  l. 

Theorem  A.3:  Let  /£ LG  and  let  0j  and  dj  be  its  neighborhood  and  delay  constants  respectively. 

Then  if  e=«<vx,vy>  is  the  directed  edge  from  vx  to  vy  and  its  label  is  l  then  TA(vy)=TA(vx)+dr 

Proof:  (A)  Let  lt{ll,l2}.  Clearly,  vx,  vy  and  e  are  all  in  the  same  mesh  graph  within  the  same 
set  in  CG  say  CG;.  So  yi3-x  and  from  the  mapping  algorithm, 

TA|vJ>TA(vx)=(ya-xa)dI1+(y<2-x,2)di2 

1.  Let  the  label  of  e  be  11  and  so  yi2-xi2=0  and  y,j-x{1=*l  and  hence,  TA(vy)-TA(vx)=d(1 

2.  Let  the  label  of  e  be  12  and  so  yfl-xa=0  and  y<2-x<2=l  and  hence,  TA(vy)-TA(vx)=sdj0 

(B)  Let  the  label  of  e  be  13.  So  yJ3-xi3=l,  yj2-xi2=0  and  ya-xa=0.  Let  vx  be  a  vertex  in  a  mesh 

graph  in  CG;.  Clearly,  vy  must  be  a  vertex  in  some  mesh  graph  in  CGi+1.  From  phase  3  of  the  mapping 
algorithm  it  can  be  shown  that  TA(vy)-TA(vx)sasdj3. 

From  (A)  and  (B)  above  the  theorem  follows. 

□ 

Lemma  A.14:  Let  f€LG  and  nJ6{l,-l}.  Let  Pj  and  P,  be  two  distinct  major  paths  labelled  l  and  let 

vx  and  vy  be  the  first  computation  vertices  in  Pj  and  P,  respectively.  Let  PA(vx)**s1(  PA(vy)=s„ 
TA(vx)=tj  and  TA(vy)=t2.  If  the  input/output  values  represented  by  source  and  sink  vertices  of  Pt  and 
P2  appear  simultaneously  at  the  input  port  of  a  processor  then  (t2-t2 )nf=(s2-s1  )df. 

Proof:  Assume  without  loss  of  generality  tha  the  input  values  represented  by  the  source  vertices  of  Pt 
and  P2  appear  simultaneously  at  the  input  port  of  processor  s. 

1.  Let  nj=l.  The  input  port  labelled  l  of  processor  1  is  the  external  input  port  through  which  the 
input  value  represented  by  source  vertices  labelled  l  are  fed  in.  The  input  value  represented  by 
the  sources  of  the  major  paths  Pj  and  P2  pass  through  intermediate  processors  ranging  from  1 
to  Sj  and  1  to  s2  respectively,  s  is  one  such  intermediate  processor.  Let  t  be  the  time  at  which 
both  the  values  appear  at  the  input  port  labelled  l  of  s.  The  time  taken  by  the  input  value 
represented  by  source  vertex  of  Pj  to  reach  the  input  port  labelled  /  of  s1  is  (s,-s)dj+t  which  is 
TA(vx).  Similarly  the  time  taken  by  the  input  value  represented  by  the  source  vertex  of  P,  to 
reach  the  input  port  labelled  l  of  s2  is  (s3-s)df-t-t  which  is  TA(vy)  and  hence, 
t2-tj=(s2-Sj)d(  and  so 

(bj-tjJn^CSj-SjJd, 

2.  Let  n(=*I.  The  input  port  labelled  l  of  processor  |N|  is  the  external  input  port.  So  the  input 
value  represented  by  source  vertex  of  Pj  travels  from  |N|  to  Sj  passing  through  the 
intermediate  processor  s  and  the  input  value  represented  by  source  vertex  of  P„  travels  from 
|N|  to  s2  passing  through  s.  Let  t  be  the  time  at  which  both  these  input  values  reach  s.  Time 
taken  to  reach  by  the  input  value  represented  by  source  vertex  of  Pt  is  t+(s-Sj)d,  and  the 
time  taken  to  reach  s.,  by  the  input  value  represented  by  source  vertex  of  P2  is  t+(s-s2)dj  and 
hence, 


From  (1)  and  (2)  the  iemma  follows. 


□ 

We  next  show  that  the  mapping  ensures  that  no  two  input/output  values  appear  simultaneously  at  the 
input  port  of  any  processor. 

Theorem  A.4:  Let  l€{il,/2,13}.  Let  Pj  and  P,  be  two  distinct  major  paths  labelled  i.  The  mapping 

ensures  that  the  input/output  value  represented  by  the  source/sink  vertices  of  Pj  and  P0  never  appear 
simultaneously  at  the  input  port  labelled  l  of  any  processor. 

Proof:  Let  1=11  and  vx  and  vy  be  the  first  computation  vertices  of  Pj  and  P2  respectively.  From  the 
mapping  algorithm  we  obtain, 

PA(vy)  -PA(vx)=AP=k1nil+lc2ni2+lC3ni3 
TA  (  -TA  (  vx)  =AT=lc1du  *k2d12*k3dl3 

where  k1=(yfl-xfl)  and  hj<kj<hlt  k,=(yi2-x,2)  and  -h2<k2<h2,  k3=(yi3-x,3)  and  -h3<k3<h3. 

Assume  that  the  input/output  value  represented  by  the  source/sink  vertices  of  Pt  and  P„  appear 
simultaneously  at  the  input  port  labelled  ll  of  a  processor.  By  lemma  A.14, 
d(1AP=naAT  (*) 

We  next  show  that  (•)  cannot  be  satisfied. 

1.  Let  n,2=l  and  so  by  the  mapping  algorithm,  dtl=l  and  df2=2.  Pj  and  P,  are  distinct  major 
paths  labelled  11  and  so  k2=k3^0. 

a.  Let  h,-h2+nZ3>0.  So  dJ3=h,+l+2na  and  (*)  reduces  to  k3(h,+l+ni3)+k2=0.  Now 
hj+l+n^^l  and  so  k2^0  and  k3^0.  Besides  h2<hl+nu  and  -h2<k,<h2  and  so  (*) 
cannot  be  satisfied. 

b.  Let  hj-h2+ntt<0  and  so  d{3=hj+n{3  and  (*)  reduces  to  k3(h,+l)+ko=0.  Now  h0>l 
and  so  k2^0  and  k3^0.  Besides  -h2<k2<h2  and  so  (*)  cannot  be  satisfied. 

2.  Let  n<2=-l.  So  d{1=l  and  d(2=l. 

a.  Let  h2-hj+nj3>0  and  so  d(3=2h2H-l+nfj.  So  (•)  reduces  to  2k2+k3(2h2+l)=0.  As 
h2>l,  so  2h2+l>3  and  so  k2^0  and  k3^0.  Besides  -h0<k^<h0  and  so 
-(2h2+l)<2k2<2h2+l  and  so  (*)  cannot  be  satisfied. 

b.  Let  h2-h1+n,3<0  and  so  d<3=»2h1+l-ni3.  So  (•)  reduces  to  2k3+k3(2hj+l-2nf3)=0. 

Now  l<h2<h1-n{3.  So  2hj+l-2ni3>l  and  hence  k2^0  and  k3^0.  Besides  -h«<k,<h, 
and  so  -(2h1+l-2n<3)<2k2<2hl+l-2n<3  and  hence  (•)  cannot  be  satisfied. 

A  similar  proof  can  be  used  to  show  that  the  theorem  holds  for  1=12  and  1=13. 

□ 


